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Abstract 


The detection of the accelerated expansion of the Universe has been one of the major breakthroughs in 
modern cosmology. Several cosmological probes (Cosmic Microwave Background, Supernovae Type Ia, Baryon 
Acoustic Oscillations) have been studied in depth to better understand the nature of the mechanism driving this 
acceleration, and they are being currently pushed to their limits, obtaining remarkable constraints that allowed 
us to shape the standard cosmological model. In parallel to that, however, the percent precision achieved has 
recently revealed apparent tensions between measurements obtained from different methods. These are either 
indicating some unaccounted systematic effects, or are pointing toward new physics. Following the development 
of CMB, SNe, and BAO cosmology, it is critical to extend our selection of cosmological probes. Novel probes 
can be exploited to validate results, control or mitigate systematic effects, and, most importantly, to increase the 
accuracy and robustness of our results. 

This review is meant to provide a state-of-art benchmark of the latest advances in emerging “beyond-standard” 
cosmological probes. We present how several different methods can become a key resource for observational cosmol- 
ogy. In particular, we review cosmic chronometers, quasars, gamma-ray bursts, standard sirens, lensing time-delay 
with galaxies and clusters, cosmic voids, neutral hydrogen intensity mapping, surface brightness fluctuations, sec- 
ular redshift drift, and clustering of standard candles. The review describes the method, systematics, and results 
of each probe in a homogeneous way, giving the reader a clear picture of the available innovative methods that 
have been introduced in recent years and how to apply them. The review also discusses the potential synergies 
and complementarities between the various probes, exploring how they will contribute to the future of modern 
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1 Introduction 


The discovery of the accelerated expansion of the Universe (Perlmutter et al., 1998, 1999; Riess et al., 1998) has 
been one of the major breakthrough in modern cosmology, and also in physics in general. The general framework 
established in the previous century, where the entire evolution of the Universe was thought to be dominated by 
matter and radiation, needed to readjust to make space for a new form of energy with negative pressure that can 
be responsible for this acceleration (that was named dark energy), or, alternatively, to account for some breaking of 
the well-known general relativity at very large scales. Driven by these pioneering results, in the subsequent decades 
the scientific and technical efforts of the scientific community were dedicated to the study of methods to measure 
and characterize this accelerated expansion, and to the development of large facilities providing massive datasets 
to be analyzed. In this process, a few of these methods, also referred to as cosmological probes, have become 
standard approaches in the cosmological analysis given the large efforts spent in measurements, theoretical analyses, 
systematics characterization, and also investments. 


A comprehensive review on these methods is provided in Huterer and Shafer (2018). Here we just recall that most 
of these approaches are based on the determination of some standard properties of astrophysical objects that can be 
used to calibrate observations and measure the expansion history of the Universe. In particular, it was discovered 
that the peculiar physical characteristics of some objects allow us to infer a-priori their absolute luminosity, making 
them standard candles (or standardizable candles) with which it became possible to measure their luminosity distance. 
Locally, it was found that some stars have a variable luminosity (Cepheids, RR-Lyrae) whose period of variability 
can be used to determine precisely their absolute luminosity; detached eclipsing binaries have been also used as local 
distance indicators to determine the distance to the Large Magellanic Cloud (LMC) to <1% (Pietrzyński et al., 
2019). At larger distances, it was discovered that also the stars at the Tip of the Red Giant Branch (TRGB), easily 
identifiable in the upper part of the the Hertzsprung—Russell diagram, can be used as standard candles, having an 
almost constant I-band magnitude (Lee et al., 1993). Finally, at cosmological distances, Type Ia Supernovae (SNe) 
have been found to be ideal standardizable candles, since their peak luminosity is found to strictly correlate with 
their absolute luminosity after a proper calibration (Phillips, 1993), allowing us to probe the Universe with precise 
distance indicators up to z ~ 1.5. Similarly, the analysis of large-scale structures in the Universe highlighted the 
presence of correlated over-densities in the matter distribution at a specific separation of r ~ 100 Mpc/h. This effect, 
known as Baryon Acoustic Oscillations (BAO), was clearly seen both as wiggles in the power spectrum of galaxies 
and as a peak in the two-point correlation function (Percival et al., 2001; Cole et al., 2005; Eisenstein et al., 2005), 
and can be interpreted as the imprint of the sound horizon in the original fluctuations in the photo-baryonic fluid 
present in the very early Universe. These oscillations have been in particular used as a standard ruler to study the 
expansion history of the Universe. As a parallel effort, the observation and study of the first light emitted in the 
Universe, the Cosmic Microwave Background (CMB) radiation, done with several ground- and space-based missions 
(Smoot et al., 1992; Bennett et al., 2003; Planck Collaboration et al., 2014a; Swetz et al., 2011; Carlstrom et al., 
2011) gave us a privileged view on the early Universe, providing fundamental insights on the process of formation and 
on the main components in that early times. In addition to those, other cosmological probes have been widely used 
in the past decades to constrain the expansion of the Universe and the evolution of the matter within it. Amongst 
the most important ones, here we just mention the weak gravitational lensing (see, e.g., Bartelmann and Schneider, 
2001) and the properties of massive clusters of galaxies, in particular the cluster counts (see, e.g., Allen et al., 2011). 


While CMB, BAO, SNe, and the other previously quoted probes have increasingly gained interest in these years 
in the cosmological community, it soon became clear also that a single probe is not sufficient to constrain accurately 
and precisely the properties of the components of the Universe. Ultimately, each probe has its own strengths and 
weaknesses, being sensitive to specific combinations of cosmological parameters, to specific physical processes, specific 
range of cosmic time, and affected by specific set of systematics. In the end, the only road to move forward in our 
knowledge of the Universe is found to reside in the combination between complementary cosmological probes, allowing 
us to break degeneracies between the estimate of parameters, and also to keep under control systematic effects (see, 
e.g., Scolnic et al., 2018). This point was clearly first highlighted in the Dark Energy Task Force report (Albrecht 
et al., 2006), and since then the effort of the scientific community proceeded towards that direction, also with space 
missions specifically designed to take advantage of the synergy between different probes!. 


1As an example, the ESA space mission Euclid (Laureijs et al., 2011) will study the expansion history of the Universe and the growth 
of the structures within, taking advantage from the combination of two cosmological probes, galaxy clustering and weak lensing. 


With the development of these cosmological probes, it soon begun the era of precision cosmology, where the 
advances in the instrumental technology, supported by a more mature assessment and reduction of systematic uncer- 
tainties and by an increasing volume of data, led to percent and sub-percent measurements of cosmological parame- 
ters. However, instead of eventually closing all the questions related to the nature of the accelerated expansion of our 
Universe and of its constituents, this newly achieved accuracy actually opened even more the Pandora’s box. One of 
the most pressing issues is that the Hubble constant Ho as determined from early-Universe probes (CMB) appears to 
be in significant disagreement with respect to the estimates provided by late Universe (Cepheids, TRGB, masers, ...). 
Many analyses addressed whether this might be due to some systematics hidden in either measurement, but, as of the 
current status, this seems disfavored (Riess et al., 2011, 2016; Bernal et al., 2016; Di Valentino et al., 2016; Efstathiou, 
2020; Riess et al., 2020; Di Valentino et al., 2021; Efstathiou, 2021; Riess et al., 2021a; Dainotti et al., 2021; Riess 
et al., 2021b). At the same time, differences started arising also in other cosmological parameters as estimated from 
early- and late-Universe probes, such as the tension in the estimate of dark matter energy density Qm and of og, the 
matter power spectrum normalization at 8 h`! Mpc, often summarized in the quantity Sg = og\/Qm/0.3 (Heymans 
et al., 2013; MacCrann et al., 2015; Joudaki et al., 2017; Hildebrandt et al., 2017; Asgari et al., 2020; Park and Rozo, 
2020; Joudaki et al., 2020; Tréster et al., 2021; Asgari et al., 2021; Heymans et al., 2021; Amon et al., 2021; Secco 
et al., 2021; DES Collaboration et al., 2021). All these constraints are pointing toward significant differences of the 
order of 4-5 ø, and in the case (if confirmed) this is not attributable to some problems with the data, this may open 
the road to new physics with which to explain such discrepancies in the measurement of the same quantity probing 
different cosmic times. 


Now that the precision in many standard cosmological probes is close to reaching its maximum, given the current 
analyses or the ones planned in the near future, a way to take a step forward in our understanding of the Universe is 
to look for new independent cosmological probes (as also highlighted by Verde et al., 2019; Di Valentino et al., 2021), 
that could either confirm the discrepancies found, pointing us toward the need of new models, or deny those, helping 
us to understand better possible systematics, or unknown unknowns. Moreover, the synergy and complementarity 
between different probes can also help to reduce, when different probes are combined, the uncertainty on cosmological 
parameters. In general, the diversity between different methods will not only enrich the panorama of ways to look 
at and study our Universe, but also possibly open new observational and theoretical windows, as happened in the 
past with the study of CMB, SNe, and BAO. 


This is an exciting time for cosmology, and in this review we aim to provide a state-of-art review of the new emerg- 
ing cosmological probes, discussing how to apply them, the systematics involved, the measurements obtained, and 
the forecasts of how they could contribute to understand the evolution of the Universe. In particular, we will review 
cosmic chronometers, quasars, gamma-ray bursts, gravitational waves as standard sirens, time-delay cosmography, 
cluster strong lensing, cosmic voids, neutral hydrogen intensity mapping, surface brightness fluctuations, stellar ages, 
secular redshift drift, and clustering of standard candles. In Sect. 2 we will provide a general overview of the basic 
notation and fundamental equations assumed in the review, in Sect. 3 we will discuss separately each emerging cosmo- 
logical probe, in Sect. 4 we will discuss the synergy and complementarity between the various described cosmological 
probes, and in Sect. 5 we will draw our conclusions. 


2 Notations and fundamental equations 


One of the main assumptions in modern cosmology is the cosmological principle, which describes our Universe at 
very large scales based on two main premises: the homogeneity (the Universe is the same in every positions) and 
isotropy (there is no preferential spatial direction). Under this principle, the space-time metric can be described by 
the Friedmann-Lemaitre-Robertson-Walker (FLRW) metric: 
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where a(t) is the scale factor, that describes how the universe is expanding relating physical and comoving distances 
as R(t) = a(t)r, c is the speed of light, 6 and ¢ are the angles describing the spherical coordinates, and k is the 
parameter describing the curvature of space; in particular, a k = 0 corresponds to a flat universe described by an 
Euclidean geometry, a positive k > 0 to a closed universe with a spherical geometry, and a negative k < 0 to an open 


universe with a hyperbolic geometry. Within a FRLW metric, it is also possible to relate the scale factor with the 


redshift z, having: 
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If we define the expansion rate of the universe H(t) as the rate with which the scale factor evolves with time, 
H(t) = (4), we can describe how it evolves with cosmic time t through the Friedmann equations: 
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where G is the gravitational constant, p and p are the total energy density and pressure, A is the cosmological 

constant, and the dot indicates a derivative with respect to time. Historically, a critical value of density producing 
2 

a flat universe has been defined by equating, in the absence of a A term, Eq. 3 to zero, obtaining perit = au, This 


TG" 
quantity has proven to be extremely useful to define adimensional density parameters for the various constituents 
of the universe as Q; = —. This allows us to write the total energy density of the universe as the sum of the 


contribution of various components, namely matter and radiation; analogously, considering the terms on the right- 
hand side of Eq. 3, we can define an energy density for the curvature Qg = js and for dark energy (in the case of a 
Cosmological Constant) Q, = mee In this way, we have: 


1= S69; =O +O, +O%% +O , (5) 


where the density parameters are here defined at any given time, so as a function of redshift z. In this context, it is 
also useful to define the equation of state (EoS) parameter of a generic component as the value w relating its pressure 
and density, w = p/p. In general, we can express the evolution of the energy density as: 


pil2) = po exp { [ ea | (6) 
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While the EoS could depend on time, we recall here that the different components have different EoS parameters, 


namely w = 1/3 for radiation, w = 0 for matter, and w = —1 for the term we referred as to dark energy (in the case 
it is a Cosmological Constant). If we consider Eq. 6 in the case of a constant w;, it simplifies to: 
p= pol Fa . (7) 


Combining Friedmann equations 3 and 4 with Eqs. 5 and 7, it is possible to express the expansion rate of the 
universe as a function of the evolution with redshift of its main components: 


1/2 
H(z) = Ho [O (1 + z)4 + Om(1 + 2)? + WCL + 2)? + Nael H], (8) 


where each component evolves with a different power of (1 + z) due to the different EoS parameter of each term; 
here, we implicitly assumed the density parameters defined as constant, referred to as today’s values Q; o. We will 
assume this convention throughout the review, unless otherwise specified. In Eq. 8 we also introduced the dark energy 
density as Qae, since in this case its EoS parameter is considered having a generic value w. While, in principle, one 
could take into account also the contribution of radiation Q, that scales as (1+ z)*, typically this is not considered 
given the current constraint Q, ~ 2.47 10~-°h~? (Fixsen, 2009), and in the following we will neglect its contribution. 


So far, we have considered the dark energy as having a constant EoS parameter w = —1; however, to be more 
generic, we can allow it to vary with cosmic time, as different cosmological model would actually suggest. The most 
widely used way to parameterize this evolution is the Chevallier, Polanski, Linder (CPL) parameterization (Chevallier 
and Polarski, 2001; Linder, 2003), where: 


w(z) = wo + Wa (=) i (9) 


Considering Eqs. 6 and 9, we can therefore generalize Eq. 8 as follows: 


1/2 
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From this more general formulation where most of the cosmological parameters are let free to vary (which we will 
refer as to open WoWaCDM model, owowa CDM), it is possible to derive more specific cases. In the case we fix the 
curvature of the universe to be flat (Qy = 0), we will have a flat wowaCDM model (fwowa CDM); in case we also fix 
the time evolution of the dark energy EoS to be null (wa = 0), we will have a flat wCDM model (fwCDM); finally, 
if we assume the dark energy EoS to be constant and equal to w = —1, we will obtain the standard ACDM model. 
In this context, it is also useful to define the normalized Hubble parameter as: 


E(z) = H(z)/Hp . (11) 


The previously discussed equations describe how the cosmological background evolves. From these, we can 
introduce several additional quantities that will be extremely relevant in describing astrophysical phenomena, namely 
distances and times. Following Huterer and Shafer (2018), the comoving distance can be defined as: 
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It is interesting to notice that in the case of a standard flat ACDM cosmology, this equation can be significantly 


simplified to: 
z dz! 
D(z) = ef —. 13 


From this equation, we can define two fundamental quantities in astrophysics, namely the luminosity distance Dy (z) 
and the angular diameter distance Da(z) as: 


D(z) = (1+ z)D(z) f Da (z) = TPO : (14) 


Similarly, considering the previous definition of H(t) and considering Eq. 2, we can write: 
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and by integrating it we obtain the expression of the age of the universe as a function of redshift: 


z dz! 
t= f aes (16) 


3 Cosmology with emerging cosmological probes 


All the new emerging cosmological probes are presented following a common scheme, introducing at the beginning 
of each section the basic idea of the method and its main equations, describing how to optimally select each probe, 
discussing how it can be (and has been) applied, reviewing the current status of the art of the measurements, and 
providing forecasts on how the method is expected to improve its performance in the near future. A fundamental 
part is dedicated, in particular, to the presentation of the systematics involved in each probe, discussing how they 
impact the measurements and possible strategies to handle and mitigate them. 


3.1 Cosmic Chronometers 


The age of the Universe has been an important (derived) cosmological parameter, being closely related to the Hubble 
constant and the background parameters governing Universe’s expansion history. Determinations of the age of the 
Universe today from the age of old cosmological objects at z ~ 0 (see e.g. the reviews by Catelan, 2018; Soderblom, 
2010; Vandenberg et al., 1996 and recent determinations by O’Malley et al., 2017; Valcin et al., 2020, 2021) and of 
the look-back time at higher redshifts (Dunlop et al., 1996; Spinrad et al., 1997) have been very influential in the 
establishment of the (now) standard cosmological model. 


The age of the Universe or the look-back time, being an integrated quantity of H(z), has some limitations (both 
in terms of statistical power and in terms of susceptibility to systematics) that the cosmic chronometers approach 
attempts to overcome. 


3.1.1 Basic idea and equations 


The accurate determination of the expansion rate of the Universe, or Hubble parameter H(z) has become in recent 
years one of the main drivers of modern cosmology, since it can provide fundamental information about the energy 
content and on the main physical mechanisms driving its current acceleration. Its measurement is, however, very 
challenging, and while many works have focused on the estimate of its local value at z = 0 (the Hubble constant Ho, 
see Sect. 2), we have nowadays few determinations of H(z), and mainly based on few methods (e.g., on the detection 
of the BAO signal in the clustering of galaxies and quasars, or on the analysis of SN data, see Font-Ribera et al., 
2014; Delubac et al., 2015; Alam et al., 2017; Riess et al., 2018; Scolnic et al., 2018; Bautista et al., 2021; Hou et al., 
2021; Raichoor et al., 2021; Riess et al., 2021a). These measurements, while having their own strengths, rely on the 
adoption of a cosmological scenario such as assumption of flatness, on early physics assumptions (in the case of BAO) 
and on calibration of the cosmic distance ladder (in the case of SNe); without these assumptions, these probes yield 
the determination of the normalized expansion F(z) instead of H(z). 


In this context, it is very important to explore alternative ways to determine the Hubble parameter, that can 
be compared, and eventually combined, with other determinations. The cosmic chronometers method is a novel 
cosmological probe able to provide a direct and cosmology-independent estimate of the expansion rate of the Universe. 
The main idea, introduced by Jimenez and Loeb (2002), is based on the fact that in a universe described by a FLRW 
metric the scale factor a(t) can be directly related with the redshift z as in Eq. 2. With this minimal assumption, it 
is therefore possible to directly express the Hubble parameter as a function of the differential time evolution of the 
universe dt in a given redshift interval dz, as provided by Eq. 15: 


1 dz 
(1+z)dt` 


H(z)=-— 


Here dt/dz can be taken to be the look-back time differential change with redshift. Since redshift is a direct observable, 
the challenge is to find a reliable estimator for look-back time, or age, over a range of redshifts, i.e. to find cosmic 
chronometers (CC). 


The novelty and added value of this method with respect to other cosmological probes is that it can provide a 
direct estimate of the Hubble parameter without any cosmological assumption (beyond that of an FLRW metric). 
From this point of view, the strength of this method is its (cosmological) model independence: no assumption is 
made about the functional form of the expansion history or about spatial geometry; it only assumes homogeneity 


and isotropy, and a metric theory of gravity. Constrains obtained with this method, therefore, can be used under 
extremely varied cosmological models. 


There are three main ingredients at the basis of the CC method: 


1. the definition of a sample of optimal CC tracers. As highlighted in Eq. 15, a sample of objects able 
to trace, at each redshift, the differential age evolution of the Universe is needed. It is fundamental that this 
sample of cosmic chronometers is homogeneous as a function of cosmic time (i.e., the chronometers started 
ticking in a synchronized way independently of the redshift they are observed at), and optimized in order 
to minimize the contamination due to outliers. The optimal selection process will be described in detail in 
Sect. 3.1.2. 


2. the determination of the differential age dt. The CC method is typically applied on tracers identified 
through spectroscopic analysis, where the redshift determination is extremely accurate (6z/(1 + z) < 0.001, 
see e.g. Moresco et al., 2012a). As a consequence, as can be seen from Eq. 15, the only remaining unknown is 
the differential age dt. Different techniques have been explored to obtain robust and reliable differential age 
estimates for CC, to estimate statistical and systematic uncertainties, and they will be presented in Sect. 3.1.3. 


3. the assessment of the systematic effects. As any other cosmological probe, one of the fundamental issues 
to be assessed is the sensitivity of the method to effects that can systematically bias the measurement. All the 
various systematic effects will be examined in Sect. 3.1.4. 


3.1.2 Sample selection 


Cosmic chronometers are objects that should allow us to trace robustly and precisely the differential age evolution of 
the Universe across a wide range of cosmic times. For this reason, the most useful astrophysical objects are galaxies: 
with current ground and space-based facilities, these objects can be observed with reasonably high signal-to-noise 
over a wide area and range of redshifts. Two different approaches have been explored. 


Imagine to select, in a given redshift range, a complete sample of galaxies, independently of their properties, 
and estimate their age as to homogeneously populate the age(z) plane. With enough statistics, it becomes possible 
to estimate the upper envelope (also called red envelope) of the age(z) distribution. Under the assumption that all 
galaxies formed at the same time independently of the observed redshift (which relies on the Copernican principle) 
and that the sample is complete, the envelope can be used to measure the differential age of the Universe. The 
advantage of this kind of approach is that the selection of the sample is very straightforward, at the cost of being 
significantly demanding, since to determine robustly the “edge” of the distribution and its associated error, very 
high statistics are needed in order not to be biased by random fluctuations in the determination of the ages of the 
population (e.g., see Jimenez and Loeb, 2002; Jimenez et al., 2003; Simon et al., 2005; Moresco et al., 2012a, where 
over 11000 massive and passive galaxies have been selected to apply this method). 


A more practical solution, therefore, is to (pre-)select an homogeneous population representing at each redshift 
the oldest objects in the Universe. The best cosmic chronometers that have been identified are extremely massive 
(log(M/Mo) >10.5-11) and passively evolving galaxies (sometime also inappropriately referred as early-type galax- 
ies). These objects represent the most extreme tails in the mass function (MF) and luminosity function (LF), from 
the local Universe (Baldry et al., 2004, 2006, 2008; Peng et al., 2010) up to high redshift (Pozzetti et al., 2010; Ibert 
et al., 2013; Zucca et al., 2009; Davidzon et al., 2017). Many recent studies (e.g., Daddi et al., 2004; Fontana et al., 
2006; Ilbert et al., 2006; Wiklind et al., 2008; Caputi et al., 2012; Castro-Rodriguez and Lopez-Corredoira, 2012; 
Muzzin et al., 2013; Stefanon et al., 2013; Nayyeri et al., 2014; Straatman et al., 2014; Wang et al., 2016; Mawatari 
et al., 2016; Deshmukh et al., 2018; Merlin et al., 2018, 2019; Girelli et al., 2019) have identified a population of 
massive quiescent galaxies at high redshift (z = 2.5). There is a large literature supporting the scenario in which 
these systems have built up their mass very rapidly (At < 0.3 Gyr, Thomas et al., 2010; McDermid et al., 2015; Citro 
et al., 2017; Carnall et al., 2018) and at high redshifts (z > 2 — 3, Daddi et al., 2005; Choi et al., 2014; McDermid 
et al., 2015; Pacifici et al., 2016; Carnall et al., 2018; Estrada-Carpenter et al., 2019; Carnall et al., 2019), having 
quickly exhausted their gas reservoir and being then evolving passively. For this reason, such objects constitute a 
very homogeneous population also in terms of metal content, having been found to have a solar to slightly oversolar 
metallicity from z ~ 0 up to z ~ 2 (Gallazzi et al., 2005; Onodera et al., 2012; Gallazzi et al., 2014; Conroy et al., 
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Figure 1: Impact of selection criteria on the purity of CC samples. Left panel: stacked spectra of differently 
selected samples of passive galaxies from the zCOSMOS survey in two different mass bins (log(M/Mo) < 10.25 
and log(M/Mo) > 10.75), showing how, in many selection criteria, the contamination by significant emission lines 
is still clearly evident, especially in the low mass bin. Note that in the high mass bin emission lines are not visible 
indicating much reduced contamination of the sample. Right panel: NUVrJ diagram for galaxies from the LEGA- 
C survey. The points have been colored by their H/K ratio, where the dashed line shows the division between 
passive and star-forming objects (Ilbert et al., 2013) (the shaded region identifies the green valley Davidzon et al., 
2017), and the points highlighted in black the selected CC. Images reproduced with permission from Moresco 
et al. (2013) and Borghi et al. (2021b), copyright by Astronomy & Astrophysics and Astrophysical Journal. 


2014; Onodera et al., 2015; McDermid et al., 2015; Citro et al., 2016; Comparat et al., 2017; Saracco et al., 2019; 
Morishita et al., 2019; Estrada-Carpenter et al., 2019; Kriek et al., 2019). The mere existence of a population of 
passive and massive galaxies already at z ~ 2 further supports this scenario (Franx et al., 2003; Cimatti et al., 2004; 
Onodera et al., 2015; Kriek et al., 2019; Belli et al., 2019). A clear pattern has also been found strictly connecting 
the mass, the star formation history (SFH), and the redshift of formation of these galaxies; within this scenario, 
referred to to as mass downsizing, more massive galaxies are found to have been formed earlier, to have experienced 
a more intense, even if short, episode of star formation, and to have a very homogeneous SFH (Heavens et al., 
2004a; Cimatti et al., 2004; Thomas et al., 2010). To summarize, these galaxies represent a population where the 
age difference dt between two suitable separated (and suitably narrow) redshift bins is significantly larger than their 
internal time-scale evolution, making them optimal chronometers. For a more detailed review on massive and passive 
galaxies, we refer to Renzini (2006). 


Many different prescriptions have been suggested in the literature to select passive galaxies, based on rest-frame 
colors (Williams et al., 2009; Ilbert et al., 2010, 2013; Arnouts et al., 2013), the shape of the spectral energy 
distribution (SED) (Zucca et al., 2009; Ilbert et al., 2010), star formation rate (SFR) or specific SFR (sSFR) (see, 
e.g., Ibert et al., 2010, 2013; Pozzetti et al., 2010), presence or absence of emission lines (see, e.g., Mignoli et al., 
2009; Wang et al., 2018), and even morphology. The important question in this context is whether these different 
selection criteria are all equivalent to select CC. The short answer is no. In several papers (Franzetti et al., 2007; 
Moresco et al., 2013; Belli et al., 2017; Schreiber et al., 2018; Fang et al., 2018; Merlin et al., 2018; Leja et al., 
2019; Diaz-Garcia et al., 2019) it has been found that a simple criterion is not able per-se to select a pure sample 
of passively evolving galaxies, and that, depending on the criterion, a conspicuous number of contaminants might 
remain. This is clearly shown in the left panel of Fig. 1, reproduced from Moresco et al. (2013). The reference 
and the figure highlight how passive galaxies selected with several different criteria still shows evidence of emission 
lines, with a residual contamination by blue/star-forming objects that, depending on the criterion, can be as high 
as 30-50%. In the same work, as also reported by the figure, it was also shown that a cut in stellar mass is helpful 
to increase the purity of the sample, and that, at fixed criterion, the contamination is significantly smaller at high 
masses (decreasing by a factor 2-3 from log(M/Mo) < 10.25 to log(M/Mo) > 10.75). 
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Both in Moresco et al. (2013) and in Borghi et al. (2021b) it has been demonstrated that, in order to maximize the 
purity of the sample and to select the best possible sample of CC, different criteria should be combined (photometric, 
spectroscopic, stellar mass/velocity dispersion cut, potentially morphological). In Moresco et al. (2018), a detailed 
selection workflow has been proposed, which can be summarized in the following three criteria: 


i) a photometric criterion to select the reddest objects, based on the available photometric data. Among the best 
ones there is the one based on the NUVrJ diagram (Ilbert et al., 2013), but other alternatives are the UVJ 
diagram (Williams et al., 2009), or the NUVrK (Arnouts et al., 2013), or also selections based on full SED 
modeling (e.g., see Ibert et al., 2009; Zucca et al., 2009). It is important to underline, however, that having 
information about the UV flux is proven to be very important to discard the contamination by a young (0.1-1 
Gyr) population, and that the NUVrJ diagram has been demonstrated to be the most robust one to distinguish 
star-forming and passive populations. 


ii) a spectroscopic criterion, in order to check that no residual emission lines, that might trace the presence of on- 
going star formation, are present in the spectrum. Depending on the redshift and on the wavelength coverage 
of the data, the most important emission lines to be checked are [OII]\3727, H6 (A = 4861A), [OIII]A5007, 
and Ha (A = 6563A), and different kind of cuts can be adopted, based on the equivalent width (EW) of the 
line (e.g., EW<5A Mignoli et al., 2009; Moresco et al., 2012a; Borghi et al., 2021b), on its signal-to-noise ratio 
(S/N, e.g., Moresco et al., 2016b; Wang et al., 2018), or a combination of these. In general, it is important that 
the selected spectra do not show any sign of emission lines (as an example, see Fig. 2). 


iii) a cut in stellar mass, or, equivalently, in stellar velocity dispersion o,. As discussed above, the more mas- 
sive a system is, the oldest, more coeval, and less contaminated it is. Therefore, typically a cut around 
log(M/Mo) >10.6-11 is adopted. 


Any other less stringent selection criteria will yield a sample with a residual degree of contamination by star-forming 
objects, which we will address in Sect. 3.1.4. It is interesting to notice that recently some alternative estimator has 
been suggested that can help to track the purity of the sample. In Moresco et al. (2018), the ratio between the 
Call H (A = 3969A) & K (A3934A) lines has been introduced as a novel way to trace the degree of contamination 
by a star-forming component. The reason is that, while for a passive population typically the ratio H/K is larger 
than one (being the K line deeper than the H line), the presence of a young component affects this quantity, being 
characterized by non-negligible Balmer line absorptions, and in particular by the presence of He (A = 3970A) that 
get summed with the Call H line, inverting the ratio. This new diagnostic has been demonstrated to be extremely 
powerful, since it correlates extremely well with almost all other indicators of ongoing star formation, as shown in 
the right panel of Fig. 1 (NUV and optical colors, SFR, emission lines, see Borghi et al., 2021b), and can be an useful 
independent indicator of the presence of a residual ongoing star formation. 


3.1.3 Measurements 


Measuring the age of a stellar population presents several challenges. One of the main issues is the existence of 
degeneracies between the physical parameters, so that the spectral energy distribution (SED) of a galaxy can be 
approximately reproduced with quite different combinations of age and other parameters. The most well-known one 
is the age-metallicity degeneracy (Worthey, 1994; Ferreras et al., 1999), and it is connected to the fact that both an 
older age and an higher metallicity produce a reddening of galaxies spectra; in particular, it has been found from 
synthetic stellar population models that the optical colors of early-type galaxies obtained by changing their ages and 
metallicities while keeping the ratio Aage/A[Z/H] ~ 3/2 are almost the same. The degeneracy between the age of 
a galaxy and its star formation history (SFH) (Gavazzi et al., 2002) or the dust content should also be mentioned 
(even though we note that the second one is typically negligible at most for accurately selected passive galaxies, due 
to their low contamination by dust, see Pozzetti and Mannucci, 2000). Therefore, while age estimates for galaxies 
obtained from multi-band SED-fitting are quite common in the literature, they are not suitable for this purpose. 


With the advent of high-resolution spectroscopy over a wide wavelength range and for large galaxy samples, and 
more accurate stellar model and fitting methods, it has become possible to lift these degeneracies and estimate the 
ages of stellar population of galaxies much more accurately and precisely. Moreover the main strength of the CC 
method is that it is a differential approach, where the quantity to be measured is the differential age dt, and not the 
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absolute age t. The advantage is that any systematic effect that might be introduced by any method in the estimate 
of t is significantly minimized in the measurement of dt; any systematic offset in the absolute age estimation will 
not impact the determination of dt. This is confirmed also by independent analysis (e.g., see Marin-Franch et al., 
2009), demonstrating that the accuracy reached in the determination of relative ages is much higher than the one on 
absolute ages. 


Different methods have been proposed in the literature to obtain a robust estimate of dt from galaxy spectra. 
These can be roughly classified in two “philosophies”: using the full spectral information versus selecting only specific 
features sensitive to the age and well localized in wavelength. Using the full spectral information extracts the 
maximal amount of information possible (minimizes statistical errors) but is more sensitive to systematics, i.e., 
other physical process than age that leave their imprint on the spectrum, and exhibit some dependence of the age 
estimate on evolutionary stellar population synthesis models. Using localized features attempts to mitigate that, 
at the expense of possibly larger statistical errors. To keep systematic errors well below the statistical ones, the 
preferred methodology might change depending on the statistical power of the datasets available. With very large, 
high statistics datasets becoming available, the focus has shifted from full spectral fitting to using only specific 
features. 

The main methods to measure dt from galaxy spectra can be summarized as follows. 


3.1.3.1 Full-spectrum fitting 


The most straightforward approach is to take advantage of the full spectroscopic information available by fitting the 
entire spectrum with theoretical models. Different components, obtained from stellar population synthesis models, 
are typically combined with a mixture of different physical properties (age, metal content, mass), and properly 
weighted to reproduce the observed spectrum in a given wavelength window (usually within the optical range). 
The strength of this approach is therefore to be able to reconstruct, together with the age and metallicity of the 
population, also its star formation history, either in a parametric or non-parametric way. Currently, several codes 
have been developed and are publicly available to perform a full spectrum fitting, differing slightly for the model 
implemented, how the SFH is reconstructed, and on the statistical methods. The first such method that started the 
field is the MOPED algorithm (Heavens et al., 2000, 2004b); after that, amongst the most used we can find STARLIGHT 
(Cid Fernandes et al., 2005), VESPA (Tojeiro et al., 2007), ULySS (Koleva et al., 2009), BEAGLE (Chevallard and 
Charlot, 2016), FIREFLY (Wilkinson et al., 2017), pPXF (Cappellari, 2017), and BAGPIPES (Carnall et al., 2018). In 
Fig. 2 we show as an example the typical spectrum of a passively evolving population obtained by stacking roughly 
100000 spectra extracted from the Sloan Digital Sky Survey Data Relase 12 (SDSS-DR12). The figure also highlights 
the locations of relevant spectral features. 


3.1.3.2 Absorption features (Lick indices) analysis 


Another approach is to analyze, instead of the full spectrum, only some specific regions characterized by well un- 
derstood absorption features, also known as Lick indices. These indices, originally introduced by Worthey (1994); 
Worthey and Ottaviani (1997) are characterized by a strength that can be directly linked to a variation of the 
property of the stellar population; some indices are more useful to trace to the age of the population (typically the 
Balmer lines), others the stellar metallicity (typically Fe lines), and others the alpha-enhancement (e.g. Mg lines). 
Also in this case, public codes exist to measure Lick indices (see, e.g., indexf Cardiel, 2010 and pyLick Borghi et al., 
2021b). The specific dependence of each index (shown in Fig. 2) on physical properties has been at first assessed in 
Worthey (1994). A significant step forward in their use to quantitatively determine the age of a stellar population 
has been done by Thomas et al. (2011); this consists in constructing stellar population models specifically suited 
for modeling Lick indices, including a variable element abundance ratio, that can be compared with the data (e.g., 
with a Bayesian approach). This step is fundamental since it overcomes the limitation of the full-spectrum fitting, 
allowing also the possibility to determine, together with the age and metallicity, also the alpha-enhancement of a 
stellar population. It is worth noting that more recently other models with variable element ratios that could be 
used for this purpose have also been proposed by Conroy and van Dokkum (2012) and Vazdekis et al. (2015). 
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Figure 2: Stacked spectrum of ~100,000 massive and passive CC selected from SDSS DR12. It is possible to see 
clearly how it is characterized by a red continuum, several absorption lines (identified by the black boxes), and 
by the absence of significant emission lines (whose position is highlighted by the red boxes). 


3.1.3.3 Calibration of specific spectroscopic features 


Finally, one of the more commonly adopted approach in the CC works, is to focus on a single spectroscopic feature 
found to have a tight correlation with the age of the population. This approach was introduced by Moresco et al. 
(2012a), who proposed to use the break in the spectrum at 4000 Å rest-frame (D4000, one of the main characteristic of 
the spectrum of a passive galaxy, as also shown in Fig. 2). The D4000 has been demonstrated to correlate extremely 
well with the stellar age (at fixed metallicity). Moreover it has been shown that the dependence of D4000 on the two 
quantities (age and metallicity Z) can be described by a simple (piece-wise) linear relation in the range of interest 
for the analysis: 

D4000 = A(Z, SFH) x age+B, (17) 


where B is a constant and A(Z, SFH) is a parameter, which for a broad age range depends only on the metallicity 
Z and on the SFH, and can be calibrated on stellar population synthesis (SPS) models. By differentiating Eq. 17, 
it is possible to derive the relation between the differential age evolution of the population dt and the differential 
evolution of the feature, dD4000, in the form dD4000 = A(Z, SFH) x dt. This allows us to rewrite Eq. 15 as: 


A(Z,SFH) dz 
1+z  dD4000 


H(z) = (18) 
with the advantage of having decoupled statistical (all included in the observationally measurable term dz/dD4000) 
from systematic effects (captured by the coefficient A(Z, SFH)). We note here that different definitions have been 
proposed in the literature to measure the D4000, which is the ratio between the average flux Fv) in two windows 
adjacent to 4000 A rest-frame, one assuming wider bands (D4000,,, [3750-3950] A and [4050-4250] A, Bruzual A. 
1983) and one with narrower ones (D4000,, [3850-3950] A and [4000-4100] A, Balogh et al. 1999); in the following, we 
will consider D4000,,, since it has been shown that it has been demonstrated to have a significant smaller dependence 
on potential reddening effects (Balogh et al., 1999). 


To apply the improved CC method as described by Eq. 18, it is therefore necessary to measure the following quantities: 


1. the differential AD4000 of a sample of CC over a redshift interval Az. Since this process involves the estimate 
of a derivative, to increase its accuracy and minimize the noise due to statistical fluctuation of the signal, it 
should be done both averaging the D4000 of galaxies in redshift slices and then estimating dD4000, or stacking 
multiple spectra of CC to increase the spectral S/N, and measuring the D4000 on the stacked spectra, as 
shown in Fig. 3. Equation 18 disentangles observational errors from the systematic errors associated with 
the interpretation (such as dependence on the SSP model, degeneracies with metallicity, etc.). The D4000 
is a purely observational quantity, and thus, barring observational systematics such as wavelength calibration 
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Figure 3: Application of the CC method. In the left panel is shown an example of averaged D4000 — z (with 
uncertainties smaller than the symbol size, thus differences can be robustly computed) relations for a CC sample 
extracted from SDSS-DR12, in different velocity dispersion bins as shown in the label. It is clearly evident a 
downsizing pattern, for which more massive (with higher ø) galaxies have also larger D4000 values, corresponding 
to higher ages; it also shows the expected decrease of D4000 with redshift. The brackets show for an illustrative 
couple of points the calculation of dD4000 and dz. The right panel shows theoretical D4000-age relations obtained 
with SPS models by Maraston and Strömbäck (2011) used to calibrate Eq. 18. Lines from the upper to the lower 
ones show different stellar metallicities, from twice as solar, to solar and half solar. Different lines present, at 
fixed metallicities, different SFH, namely with r = [0.05, 0.1, 0.2, 0.3] Gyr (from left to right). The colored lines 
show, for one SFH for each metallicity, the best fit obtained with a piece-wise linear relation. The arrows indicate 
how different parameters affect the D4000-age relations, being important to keep in mind that the calibration 
parameter A(Z, SFH) is the slope of the relation. 


or instrument response, its measurement is affected only by statistical uncertainty which can be reduced by 
increasing the number of objects with spectra and/or increasing the S/N per spectrum. 


. the metallicity Z and SFH of the selected sample. As a result of the strict selection criteria (see Sect. 3.1.2), 
selected galaxies are characterized by a SFH with a very small duration: + < 0.5 Gyr (in many cases < 0.2 
Gyr) when parameterized with an exponentially declining SFH with 7 the formation time scale (in Gyr). 
Nevertheless, SFH should be taken into account and correctly propagated in the measurement, since despite 
the selection, describing those system as single stellar population (SSP) would be over-simplistic. The method 
to estimate the SFH are mostly based on SED-fitting or on full-spectrum fitting, or on a combination of those 
(see, e.g, Tojeiro et al., 2007; Chevallard and Charlot, 2016; Citro et al., 2016; Carnall et al., 2018, 2019). 
Despite the fact that by construction the CC population is very homogeneous also in metal content, and it 
has been observed to have a solar to slightly over-solar metallicity over a very wide range of cosmic times 
(see Sect. 3.1.2), the stellar metallicity Z need to be determined too. Also in this case, different approaches 
are viable, from considering a data-driven prior on it (Moresco et al., 2012a), estimating it with full-spectrum 
fitting considering different codes and models (Moresco et al., 2016b), or measuring it from Lick index analysis 
(Gallazzi et al., 2005; Borghi et al., 2021b). 


. the calibration parameter A(Z/SFH) to connect variations in D4000 to variations in the age of the stellar 
population, assuming different SPS models. This involves generating several D4000-age relations exploring 
different metallicities and SFH, and adopting several different SPS models. 


As already discussed, these relation can be well approximated to be linear (or, better, piece-wise linear, as 
shown in Fig. 3), whose slopes are the parameter A(Z, SFH) in the regime of interest. At fixed metallicity 
and in a given D4000 regime, it is then possible to estimate the spread in the slopes obtained by varying 
the SFH within the observed ranges, and use this as associated uncertainty to the calibration parameter, 
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ie. A(Z,SFH) = A(Z)+o04(SFH). These measurements, available from models at given metallicities (e.g. 
Z/Zo = 0.5,1,2 for the example in Fig. 3), can be afterwards interpolated, to obtain a value with its error for 
any given metallicity. The correct calibration parameter for each point will be therefore estimated from the 
measured (or assumed) metallicity, together with its error, for a global A +ø 4 that takes into account both the 
uncertainty on SFH and on metallicity. We will explore the impact of the SPS model choice on the systematic 
error budget in Sect. 3.1.4. 


All these quantities will be combined in Eq. 18 to obtain an estimate of the Hubble parameter H(z) and of its 
uncertainty. 


A final, yet important point to keep in mind is that, in order to be cosmology-independent, the CC approach 
must rely on age estimates that do not assume any cosmological prior. This is a very important point, since in many 
(if not in most) analyses, a cosmologically-motivated upper prior on age is adopted in order to break or minimize the 
previously discussed degeneracies. Of course, for the CC method to be used as a test for cosmology, it is of paramount 
importance to obtain a robust age estimate without introducing any (prior) dependence on a cosmological model, in 
order to avoid circularity and, basically, retrieve the cosmological model used as a prior. 


3.1.4 Systematic effects 


In this section, we give an overview of the possible systematic effects that can affect the CC method, discussing 
approaches to minimize them and propagate them to the total covariance matrix. We begin by discussing effects 
and assumptions that have a direct impact on the uncertainty on H(z), and conclude presenting additional possible 
issues that might impact on the measurement, but that turn out to be negligible. 


The main systematic effects can be divided into four components, and are summarized below. Each one of those 
will provide a contribution in the total systematic covariance matrix Cov;}*. 
Error in the CC metallicity estimate Covi. The metallicity estimate enters in Eq. 18 by changing the 
calibration parameter A(Z,SFH). An error in its value, therefore, directly affects the H(z) measurement and 
its associated error budget. In Moresco et al. (2020), this issue has been addresses by performing a Monte Carlo 
simulation of SSP-generated galaxy spectra considering a variety of SPS models, with metallicities spanning different 
ranges (+10%,5%,1%) and estimating the Hubble parameter. In this way, it was estimated that the error induced on 
H(z) scales almost linearly with the uncertainty on the stellar metallicity, which is corroborated observationally by 
the analysis in Moresco et al. (2016b), where a 10% error on the metallicity was found to correspond to a 10% error 
on the Hubble parameter. Hence, the uncertainty on stellar metallicity (if known and quantified correctly) can be 
quantitatively propagated to an error on H(z) following the procedure highlighted in Sect. 3.1.3. This contribution 
does not introduce off-diagonal terms in the covariance matrix because it depends on the stellar metallicity of each 


spectra (be of an individual object or a co-add) and does not correlate different spectra. 


Error in the CC SFH Cow, Even if CC have SFH characterized by very short timescales, assuming that the 
entire SFH is concentrated in a single burst (SSP) introduce a systematic error which must be accounted for as 
described in Sect. 3.1.3. This is typically a systematic contribution of the order of 2-3%; as an example, in Moresco 
et al. (2012a), where the estimated uncertainty on the SFH timescale was 0 < T < 0.3 Gyr, the contribution to the 
final error on H(z) was of ~2.5%. Also this contribution to the covariance matrix is taken to be purely diagonal. 


Assumption of SPS model Cov". The major source of systematic uncertainty in the CC method, indepen- 


dently of the process adopted to estimate dt, is the assumption of the SPS model. This is also by definition a 
term that introduces non-diagonal elements in the total covariance matrix, as the errors are highly correlated across 
different spectra. The estimation of its impact on the H(z) error was assess in Moresco et al. (2020). In this work, 
a wide combination of models was studied, including a variety of SPS models (BC03 and BC16 Bruzual and Char- 
lot, 2003, M11 Maraston and Strömbäck, 2011, FSPS Conroy et al., 2009; Conroy and Gunn, 2010, and E-MILES 
Vazdekis et al., 2016), initial mass functions (IMF, including Salpeter Salpeter, 1955, Kroupa Kroupa, 2001 and 
Chabrier Chabrier, 2003), and stellar libraries (STELIB Le Borgne et al., 2003 and MILES Sanchez-Blazquez et al., 
2006). These models have then been used with a MC approach by simulating a measurement assuming a model 
and measuring the Hubble parameter with all the other ones, estimating in this way the contribution to the total 
covariance matrix due to the assumption of a specific SPS model, IMF and stellar library. It was demonstrated that 


15 


the error introduced on H(z) is, on average, smaller than 0.4% for the IMF contribution, and of the order of 4.5% 
for the SPS model contribution. 


The component due to stellar library is slightly higher, however this estimate is overly-conservative as the effect 
is driven by the inclusion of a stellar library model that has now been superseded. More importantly, it has been 
found that this uncertainty is also redshift dependent, and an explicit estimate for each component is provided as a 
function of z. 


Rejuvenation effect Covy;""*. Another possible bias to take into account is if the CCs selected present a residual 
contamination by a young component. We can divide this systematic effect into two cases. On the one side, we can 
have a part of the selected CC population composed by star-forming or intermediate systems; this event should be 
avoided, or maximally mitigated, by the accurate and combined selection process described in Sect. 3.1.2. On the 
other side, despite the accurate selection we could have that the population of a single CC, even if dominated by 
an old component, still have a minor contribution by a young underlying component of stars. This effect can bias 
the H(z) determination because it influences the overall shape of the spectrum due to the bluer color of younger 
stars, causing the measurement of younger ages and hence a biased dt. This issue has been studied in detail in 
Moresco et al. (2018), where several indicators have been explored and proposed to trace the eventual presence of o 
residual young sub-population, from the UV flux (Kennicutt, 1998) to the presence of emission lines (see, e.g., Magris 
C. et al., 2003) or of strong absorption higher-order Balmer lines (like Hô Le Borgne et al., 2006). In particular, 
by studying theoretical SPS models, the previously discussed Call H/K indicator was proposed to quantitatively 
trace the percentage level of contamination, taking advantage of the fact that the He line, characteristic of a young 
stellar component, directly affects the Call H line, and therefore the ratio. It was then assessed, given a certain 
degree of contamination, how much the D4000 would be decreased, and, therefore, how much the estimate of H(z) 
is impacted, giving in this was a direct recipe between the measured Call H/K (or upper limit due to non-detection) 
and an additional error on the Hubble parameter. A contamination by a star-forming young component of 10% (1%) 
of the total light was found to propagate to an H(z) error of 5% (0.5%); in particular, for the CC samples analyzed 
so far (Moresco et al., 2012a; Moresco, 2015; Moresco et al., 2016b; Borghi et al., 2021b), it has been found this 
contamination to be below the detectable threshold, with an eventual additional error on H(z)<0.5%. In case of a 
lack of detection and given the stringent upper limit on a possible residual contamination this contribution to the 
covariance is also taken to be diagonal. 


Following Moresco et al. (2020), the total covariance matrix for CC is defined as the combination of the statistical 
and systematic part as: 
Cov;; = Covi;** + Cov“ ; (19) 


t . ioe . . . . 
where Cov;}*", for simplicity and transparency, is decomposed the several contributions discussed above: 


syst _ met young model 
Covi; = Covi + Covi 0"? + Covi , (20) 


where the latest component can be further decomposed in: 


Covyode! = Covyn H + Covi” + Cova" + Covi S (21) 


As discussed above, Covi, Cov H and Covj;""* are purely diagonal terms, since they are related to the estimate 


of physical property of a galaxy (the stellar metallicity, and the eventual contamination by a younger subdominant 
population) uncorrelated for objects at different redshifts. Covert, instead, has been conservatively estimated as 
the contribution from different redshifts are fully correlated. In the published analyses of currently available datasets, 
the contributions Covi", Covey H and Cov} "S are already included in the errors provided (and discussed later in 
Sect. 3.1.5 and Tab. 1); the other terms have instead to be included following these recipes’. 

Other effects, which have been demonstrated to o have a negligible impact on the measurement, but which should 
be mentioned are the following: 


2To expand the analysis taking into account also the other systematic effects, a tutorial with dedicated jupyter notebooks is provided 
at https: //gitlab.com/mmoresco/CC_covariance. 
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e progenitor bias. A common observational effect that can introduce biases in the analysis of early-type galaxies 
is the so called progenitor bias (Franx and van Dokkum, 1996; van Dokkum et al., 2000): a given selection 
criterion might be effectively more stringent when applied at high redshift than at low redshift. In particular, 
high redshift objects that pass the sample selection might be older and more massive than those selected at 
low redshift, effectively representing the progenitor population of the low redshift sample. This bias becomes 
increasingly relevant when comparing objects spanning a wide range of redshifts, and, if not properly taken 
into account, could significantly affect the CC approach, since by definition it flattens the age — z relations, 
changing its slope and hence producing a biased H(z). The differential approach at the basis of CC by definition 
acts to minimize this effect, since in all cases galaxies being compared span a very small range of redshifts. 
A quantitative estimate of its impact on the CC approach has been done in Moresco et al. (2012a) with two 
different methods. On the one side, the analysis has been repeated considering only the upper envelope of 
the age — z distribution, that, by definition, could not be biased by the progenitor bias effect. The resulting 
H(z) obtained is in perfect agreement with the baseline analysis, even if with larger error-bars due to the lower 
statistics on which the upper envelope approach is based on (see Sect. 3.1.2). On the other side, the expected 
change in slope of the age — z relation, assuming a very conservative change in formation times for the CCs 
considered, has also been estimated. In this conservative estimate, it was found that the error induced on the 
estimated H(z) is ~1% on average, which is negligible considering the rest of the error-budget. 


e mass-dependence. A final effect to be further explored is if the results have some mass-dependent bias. This 
effect has been explored thoroughly in many analyses (Moresco et al., 2012a, 2016b; Borghi et al., 2021a), and 
in all cases the H(z) measured in different mass (or velocity dispersion) bins have been found to be mutually 
consistent, and with no systematic trends. This is in agreement with the expectation since CC are selected to 
be already very massive galaxies (log(M/Mọo) Z 11), comprising very homogeneous systems, as discussed in 
Sect. 3.1.2. 


3.1.5 Main results 


The first measurement with the CC method dates back to Simon et al. (2005), where they analyzed a sample of 
passively evolving galaxies from the luminous red galaxy (LRG) sample from SDSS early data release combined 
with higher redshift data from GDDS survey and archival data. The ages of these objects have been estimated 
with a full-spectrum fitting using SPEED models (Jimenez et al., 2004) estimating the age of the oldest components 
marginalizing over metallicity and SFH. Applying then the CC approach, 8 H(z) measurements were obtained in 
the range 0 < z < 1.753. Similarly, also Zhang et al. (2014) and Ratsimbazafy et al. (2017) determined new values of 
the Hubble parameter measuring dt with a full-spectrum fitting technique. They studied a sample of ~17,000 LRGs 
from SDSS Data Release Seven (DR7) and of ~13,000 LRGs from 2dF-SDSS LRG and QSO catalog, respectively, 
both extracting differential age information for their sample using the UlySS code and BC03 models, obtaining four 
additional estimates of H(z) at z < 0.3 and one at z ~ 0.47, respectively. 

The results by Moresco et al. (2012a), Moresco (2015), and Moresco et al. (2012a) are instead based on the analysis 
of the D4000 feature described in Sect. 3.1.3. The first paper examined a compilation of very massive and passively 
evolving galaxies extracted from SDSS Data Release 6 Main Galaxy Sample and Data Release 7 LRG sample and 
from a combination of spectroscopic surveys at higher redshifts (COSMOS, K20, UDS), comprising in total ~11,000 
galaxies in the range 0.15 < z < 1.3. The second paper analyzed a significantly smaller sample (29 objects) of massive 
and passive galaxies available in the literature at very high redshifts z > 1.4. Finally, in the last paper considers the 
SDSS BOSS Data Release 9, selecting a sample of more than 130000 CC in the range 0.3 < z < 0.55. In total, 15 
additional H(z) estimates are presented in the range 0.18 < z < 2. 

Most recently, in Borghi et al. (2021b) a new approach was explored, using a Lick-indices-based analysis applied 
on CC extracted from the LEGA-C survey to derive information of the physical properties (age, metallicity and 
a-enhancement) of the population, and in Borghi et al. (2021a) the resulting dt measurements were used to obtain 
a new estimate of the Hubble parameter. 


The current, most updated compilation of H(z) measurements obtained with CC is shown in Fig. 4, and provided 
in Tab. 1. All these measurements have been obtained assuming a SPS model (BC03, Bruzual and Charlot, 2003), 


3Because of the high-quality of the spectra analysed in Jimenez et al. (2004), it was possible to compute relative ages with few percent 
accuracy in between the different redshift bins. Here the clue is once again relative ages and not the absolute ones plotted in Fig. 1 of 
(Jimenez et al., 2004). 


17 


250 
Simon et al. (2005) 


Stern et al. (2010) 

Moresco et al. (2012) 

Zhang et al. (2014) 

Moresco (2015) -- 


Moresco et al. (2016) seen: 
Ratsimbazafy et al. (2017) ee 
Borghi et al. (2021) | pee i 


200 


+@ 00E <0 


150 


z) [km/s/Mpc] 
-@- 


= 
` 
i 
` 
1 
i 
\ 
a 

1 

$ 

\ 
> 
\ 
i 
\ 
\ 
\ 
` 
1 
i 


50 


0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 


redshift 


Figure 4: Hubble parameter measurements obtained with the CC method. Different colors refer to different 
methods adopted to estimate dt, as presented in Tab. 1. The dashed line shows the flat ACDM cosmological 
model from Planck Collaboration et al. (2020a) as a pure illustrative reference. 


except from the measurements from Moresco et al. (2012a), Moresco (2015), and Moresco et al. (2016b), that are 
available also with a different set of SPS models (M11, Maraston and Strömbäck, 2011). Since, as discussed above, 
one of the main source of systematic uncertainties is the SPS model assumed, for a coherent analysis the systematic 
off-diagonal component to the covariance has to be added following the recommendations of Sect. 3.1.4, and with 
the recipes presented in Moresco et al. (2020). 


These data have been widely used in the literature in a variety of applications, which we proceed to present below. 


3.1.5.1 Independent estimates of the Hubble constant Ho. 


In the framework of the well-established tension between early- and late-Universe-based determinations of the Hubble 
constant (Verde et al., 2019; Di Valentino et al., 2021), obtaining independent estimates of Ho is of great impor- 
tance as it can provide additional information to test or constrain the underlying cosmological models. By providing 
cosmology-independent estimates of H(z), whose calibration does not depend on early-time physics or on the tradi- 
tional cosmic distance ladder, CCs are of value and, by extrapolating H(z) to z = 0, could inform the current debate 
over the Hubble tension. 


This analysis can be done either by directly fitting CC data with a cosmological model (Moresco et al., 2011, 
2012b, 2016a), or to take full advantage of the cosmology-independent approach, employing extrapolation techniques 
that do not rely on cosmological models, such as Gaussian Processes or Pade’ approximation (Verde et al., 2014; 
Montiel et al., 2014; Haridasu et al., 2018; Gémez-Valent and Amendola, 2018; Capozziello and Ruchika, 2019; Sun 
et al., 2021b; Bonilla et al., 2021; Colgéin and Sheikh-Jabbari, 2021), or also based on alternative diagnostics (e.g., 
see Sapone et al., 2014; Krishnan et al., 2021). For currently published analyses using CC alone, the size of the 
error-bars on Ho including systematic uncertainties is still too large to weigh in on the tension. 


3.1.5.2 Comparison with independent probes. 


With respect to other probes, one of the strengths of CC method is that it is a direct probe of the Hubble parameter 
H(z), instead of one of its integrals (see, e.g., Eqs. 14). As a consequence, as highlighted in Jimenez and Loeb 
(2002), it is more sensitive to cosmological parameters which affect the evolution of the expansion history, where a 
difference in luminosity distance of 5% correspond to a difference in H(z) of 10%. In several works the performance 
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continues 


z H(z) oue M reference || z H(z) ona M reference 
0.07 69.0 196 F Zhang et al. (2014) || 0.4783 80.9 9 D Moresco et al. (2016b) 
0.09 69 12 F Simon et al. (2005) || 0.48 97 62 F Stern et al. (2010) 
0.12 68.6 26.2 F Zhang et al. (2014) || 0.593 104 13 D Moresco et al. (2012a) 
0.17 83 8 F Simon et al. (2005) || 0.68 92 8 D Moresco et al. (2012a) 
0.179 75 4 D Moresco et al. (2012a) || 0.75 98.8 33.6 L Borghi et al. (2021a) 
0.199 75 5 D Moresco et al. (2012a) || 0.781 105 12 D Moresco et al. (2012a) 
0.20 72.9 296 F Zhang et al. (2014) || 0.875 125 17 D Moresco et al. (2012a) 
027 77 M F Simon et al. (2005) || 0.88 90 40 F Stern et al. (2010) 
0.28 888 366 F Zhang et al. (2014) || 0.9 117 23 F Simon et al. (2005) 
0.352 83 14 D Moresco et al. (2012a) || 1.037 154 20 D Moresco et al. (2012a) 
0.38 83 135 D Moresco et al. (2016b) || 1.3 168 17 F Simon et al. (2005) 
0.4 95 17 F Simon et al. (2005) || 1.363 160 336 D Moresco (2015) 
0.4004 77 10.2 D Moresco et al. (2016b) || 1.43 177 18 F Simon et al. (2005) 
0.425 87.1 11.2 D Moresco et al. (2016b) || 1.53 140 14 F Simon et al. (2005) 
0.445 92.8 12.9 D Moresco et al. (2016b) || 1.75 202 40 F Simon et al. (2005) 
0.47 89.0 49.6 F  Ratsimbazafy et al. (2017) || 1.965 186.5 50.4 D Moresco (2015) 


Table 1: H(z) measurements (in units of [km s~'Mpc~']) obtained with the CC method and their associated 
errors. The error reported in the table represent only the diagonal part of the covariance matrix; in order to 
appropriately use these data, the full covariance has to be taken into account, as discussed in Sect. 3.1.4. The last 
two columns report the method (M) used to derive the differential age dt (full-spectrum fitting F, Lick indices 
L, calibrated D4000 D) and the corresponding reference. We note that all these measurements are independent, 
since they consider different datasets. 


of CC in constraining cosmological parameters has been compared with that other probes. In Moresco et al. (2016b) 
constraints from CC have been compared with the ones from SNe Ia and BAO considering different cosmological 
models, finding that for a flat wowgCDM model, the accuracy on cosmological parameters that can be obtained from 
CC and BAO are comparable, and that in comparison with other probes CC are in particular useful to measure 
Ho and Qm. Similar conclusions are also found by Vagnozzi et al. (2021) and Gonzalez et al. (2021), where the results 
from CC are found in good agreement with the ones of BAO and SNe over a wide range of cosmological models. Lin 
et al. (2020) focused the comparison in particular on Ho and Nm, confirming a good consistency between CC and an 
even broader collection of cosmological probes, and also highlighting the crucial synergy between the various probes. 


3.1.5.3. Constraints on cosmological parameters using CC alone and in combination with independent probes. 


CC are a very attractive probe to study non-standard cosmological models, since no cosmological assumption is made 
in the derivation of H(z). For this reason, several works have explored how they can be used to put constraints and 
provide evidences in favor or against various cosmological models, from testing the consistency with concordance 
models (Seikel et al., 2012) or the spatial curvature of the Universe (Vagnozzi et al., 2021; Arjona and Nesseris, 
2021), to exploring more exotic cosmological models (such as interacting dark energy models, but not only, see 
e.g. Bilicki and Seikel, 2012; Nunes et al., 2016; Colgdin and Yavartanoo, 2019; von Marttens et al., 2019; Yang 
et al., 2019; Benetti and Capozziello, 2019; Aljaf et al., 2021; Ayuso et al., 2021; Reyes and Escamilla-Rivera, 2021; 
Benetti et al., 2021), to directly measuring cosmological parameters (see, e.g., Sect. 3.1.6). In particular, it has been 
found that CC are extremely useful in combination with other cosmological probes (SNe, BAO, CMB) to increase 
the accuracy on cosmological parameters (such as Qk, Qm and Ho, see, e.g., Haridasu et al., 2018; Gomez- Valent 
and Amendola, 2018; Lin et al., 2021), to determine the time evolution of the dark energy EoS (Moresco et al., 
2016a; Zhao et al., 2017; Di Valentino et al., 2020; Colgáin et al., 2021), and also to provide tighter constraints on 
the number of existing relativistic species and on the sum of neutrino masses by breaking the existing degeneracies 
between parameters (Moresco et al., 2012b, 2016a). As suggested by Linder (2017), the measured H(z) data have 
also been used in combination with the growth rate of cosmic structures to construct a new diagram to disentangle 
cosmological models (Moresco and Marulli, 2017; Basilakos and Nesseris, 2017; Bessa et al., 2021). Finally, the CC 
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data, in combination with BAO and SNe, have proven to be extremely useful also to test the distance-duality relation 
and measure the transparency (or equivalently, the opacity) of the Universe (Holanda et al., 2013; Santos-da-Costa 
et al., 2015; Chen et al., 2016b; Vavryéuk and Kroupa, 2020; Bora and Desai, 2021; Mukherjee and Mukherjee, 2021; 
Renzi et al., 2021). 


3.1.6 Forecasting the future impact of cosmic chronometers 


Currently, there are two main limitations in the CC method: i) the error-bars are dominated by the uncertainty 
due to metallicity and SPS model, and ii) the absence of a dedicated survey (such as for SNe or BAO) to obtain 
a statistically significant sample of CC with high spectral S/N and resolution. For the first one, as highlighted in 
Moresco et al. (2020), there is a clear path to make progress, which involves a meticulous and detailed analysis and 
comparison of the various models with high-resolution and high S/N observations of CC spectra and SEDs. This 
program appears to be feasible, enabled by current or forthcoming observational instruments and facilities (e.g., 
X-Shooter, MOONS) possibly combined with some dedicated observations. 


On the other hand, large campaigns to detect massive and passive galaxies with spectra at high S/N and res- 
olutions are not directly foreseen at the moment, and for this science case one should rely on legacy data coming 
from other planned surveys. Nevertheless, future missions, either already planned (like Euclid, Laureijs et al., 2011), 
under study (ATLAS probe, Wang et al., 2019), or large data sets yet not fully exploited (SDSS BOSS Data Re- 
lease 16, Ahumada et al., 2020), could provide significant large statistics of massive and passive galaxies either in 
redshift ranges previously poorly mapped (1.5 < z < 2) or previously exploited with significantly lower statistics 
(0.2 < z < 0.8). 


In the following, we therefore explore two different scenarios, constructing their corresponding simulations and 
extracting forecasts on the expected performance of CC with future data. In the first scenario, we will assume to be 
able to exploit the available spectroscopic surveys at redshifts 0.2 < z < 0.8 (low-z, e.g. BOSS DR16), and to be 
able to obtain a sample large enough to measure 10 H(z) points with a statistical error of 1%, and including in the 
systematic error budget both the contribution of IMF and SPS models (as suggested by Moresco et al., 2020); note 
that already in the analysis by Moresco et al. (2016b) the statistical error was of the order of 2-3%. In the second 
scenario, we perform a simulation of CC measurements as they will be enabled by future spectroscopic surveys at 
higher redshifts (high-z), producing 5 H(z) points with a statistical error of 5% at 1.5 < z < 2.1; as an example, 
Euclid is expected to provide, especially with its Deep Fields, up to a few thousands very massive and passive galaxies 
in this redshift range, increasing by 2 orders of magnitude the currently available statistics (Laureijs et al., 2011; 
Wang et al., 2019) As a final step, we will analyze the combined measurements, and also a more optimistic scenario 
where the systematic error component is assumed to be minimized following the recipes described in Sect. 3.1.4 (and 
in particular considering the uncertainty due to SPS models resolved, remaining just with the covariance due to the 
IMF contribution). 


The H(z) simulated data are generated with a given error (uncorrelated across data points) assuming cosmological 
parameters for the ACDM model from Planck Collaboration et al. (2020a), and are shown, together with the current 
CC measurements, in the larger panel of Fig. 5. The associated covariance matrix is, then, calculated as presented 
in Sect. 3.1.4, considering the contributions previously discussed. To assess the capability of the CC method to 
constrain cosmological parameters, we explore the constraints current and future data can provide on an open 
wCDM cosmology, where both the spatial curvature density 0; and the dark energy EoS are let free to vary. We 
considered flat priors on |Ho, Qm, Qae, Wo] (the free parameters in our fit), and analyzed the data in a Bayesian 
framework with a MonteCarlo Markov Chain (MCMC) approach using the public emcee (Foreman-Mackey et al., 
2013) python code. The results are shown in Fig. 5 and in Tab. 2. 


As discussed in Moresco et al. (2016a), H(z) measurements at low redshift are crucial to better constrain the 
intercept the Hubble parameter at z ~ 0, while measurements at higher redshift become more and more important 
to determine the shape of the H(z) evolution, critically dependent on dark energy and dark matter parameters. 
As expected, the simulated CC data at low—z significantly improve the current accuracy on the estimated Hubble 
constant by a factor of = 2 by increasing the precision on the extrapolation of H(z) to z ~ 0. On the other hand, 
the high—z simulated data become fundamental to determine the dark energy EoS especially when combined with 
lower redshift data, improving the accuracy on w from 38% to 29% and on Qm from 59% to 31%. When considering 
the optimistic scenario, CC data will enable an accuracy on Ho to the 3% level, and on Qy and w to ~30%. 
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Ho % acc Qin % acc Qde % acc w % acc 
[km s~'Mpc7*] 

open wCDM 
current dataset 67.8157 11.7% | 0.227913 61% | 0.517974 49% | —1.6195 52% 
low—z W507 4.4% | 0.237513 59% | 0.592576 34% | -1.3164 38% 
high—z 67.8185 11.7% | 0.267875 40% | 057502, 40% | —1.5Ł96 50% 
combined 71.6434 4% | 0.294909 31% | 0.677072 24% | —1.wt93 29% 
optimistic TOTS 3.3% | 0.3+9:93 27% | 0.68915 21% | -1.293 28% 

flat ACDM 
current dataset 66.5 + 5.4 8.1% | 0.347008 20.6% = = = = 
optimistic 69.0 + 2.1 3% | 0.3+0.01 3.3% - - - - 


Table 2: Constraints with current and future CC measurements in an open wCDM (upper rows) and in a flat 
ACDM cosmology (lower rows). 
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Figure 5: Forecast of CC measurements with future surveys. In the bottom left panel, current CC data are 
shown with white points, while blue and yellow points present forecasts on the expected accuracy with the CC 
approach respectively at low redshift (with an accurate re-analysis of current surveys, e.g. SDSS) and from future 
surveys, like the ESA Euclid mission (Laureijs et al., 2011) or the ATLAS probe mission (Wang et al., 2019). For 
the blue points, the error-bars are smaller than the points. The outer plots shows the constraints for an open 
wCDM cosmology that can be obtained with current data (gray contours), and with different combinations of the 
simulated datasets. 


Clearly, as the dimensionality of the problem decreases, the accuracy on the derived parameter increases. As a 
comparison, in Tab. 2 we show also, for the current dataset and the optimistic scenarios, the constraints on Hp and 
Qm achievable in a flat ACDM model. In this regime, we observe a particular improvement in the accuracy on Qm up 
to the 3% level. 
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3.2 Quasars 


044-48 1 


Quasars are the most luminous persistent objects in the Universe, with integrated luminosities of 1 erg s7 
over the ultra-violet (UV) to the X-ray energy range. The UV emission is interpreted as the radiation produced 
by the material flowing towards the supermassive black hole, located in the center of a galaxy, in the form of an 
accretion disc, and it makes up to roughly 90% of the quasar bolometric budget (Shakura and Sunyaev, 1973). The 
rest is released as X-rays, which are thought to originate in a hot plasma of relativistic electrons (Svensson and 
Zdziarski, 1994), called corona for analogy with the Sun, that Compton up-scatter photons coming from the disk. 
The UV and X-ray fluxes have long been known to obey a non-linear relation between their UV (at the rest frame 
2500 A, Luy) and X-ray (at the rest frame 2 keV, Lx) emission (e.g., Tananbaum et al. 1979; Zamorani et al. 1981; 
Avni and Tananbaum 1982, parameterized as Lx x Li, with y œ 0.6), yet how the gravitational energy is partly 
transferred from the disc to the corona, preventing its fast cooling via the production of X-ray photons through the 
inverse Compton process, is unknown. 


3.2.1 Basic idea and equations 


The technique that makes use of quasars as cosmological probes hinges on the non-linear relation mentioned above 
to provide an independent measurement of their distances, thus turning quasars into standardizable candles and 
extending the distance modulus-redshift relation (or the so-called Hubble-Lemaitre diagram) of supernovae Ia to a 
redshift range that is still poorly explored (z > 2; Risaliti and Lusso, 2015). The applicability of this methodology is 
based on two key points. Firstly, the understanding that most of the observed dispersion in the Lx — Lyy relation is 
not intrinsic to the relation itself but due to observational issues, such as gas absorption in the X-rays, dust extinction 
in the UV, calibration uncertainties in the X-rays (e.g. Lusso, 2019), variability, and selection biases associated with 
the flux limits of the different samples. In fact, with an optimal selection of clean sources (i.e. where the intrinsic 
UV and X-ray quasar emission can be measured), the observed dispersion drops from 0.4 dex to ~0.2 dex (Lusso and 
Risaliti, 2016, 2017). Secondly, the slope of the Lx — Luv relation does not evolve with redshift up to z œ 4 (i.e. the 
highest redshift where the source statistics is currently sufficient to verify any possible dependence of the slope with 
distance). A key consequence is that the Lx — Lyy relation must be the manifestation of a universal mechanism at 
work in the quasar engines. 


To fit the Hubble diagram, the distance modulus for each object should be computed first. The luminosity 
distance (e.g., see Risaliti and Lusso, 2015, 2019) is derived as: 


[log Fx — b — y(log Fuv + 27.5)] 
2(y— 1) 


where Fx and Fyy are the flux densities (in erg s-' cm~? Hz~!). Fyy is normalized to the (logarithmic) value 
of 27.5 in the equation above, whilst Dy is in units of cm and is normalized to 28.5 (in logarithm)*. The slope of 
the Fx — Fyy relation, y, is a free parameter, and so is the intercept 3. The intercept 8 of the Lx — Lyy relation 
is related to the one of the Fx — Fuy relation, 8, as B(z) = 2(y — 1) log Di (z) + (y — 1) log4 + 8. The distance 


modulus, DM, is thus: 


1 
log Dy = 5 log(4r) + 28.5 , (22) 


DM = 5log Dy — 5log(10pc) , (23) 
and the uncertainty on DM, dDM, is: 
1/2 
; (24) 


dy [8 + log Fuy + 27.5 — log Fly’ 


5 
dDM = ———— | (dlog Fx)” + (ydlog Fuy)? + (dB)? + ( =i 


2(y — 1) 


where dlog Fy and dlog Fuy are the logarithmic uncertainties on Fy and Fyy, respectively. Equation 24 assumes 
that all the parameters are independent, and takes into account also the uncertainties on 6 and y. The fitted 
likelihood function, £, is then defined as: 


N 


1 =h) 
ln £ = zD“ Hi ms?) ; (25) 


i 


4The values of the normalizations depend upon the luminosity range probed by the quasar sample and should be tailored accordingly. 
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where N is the number of sources, s? = dy? + 77dx? + exp(2ln ô) takes into account the uncertainties on both 
the x; (log Fuv) and y; (log Fx) parameters of the fitted relation. The parameter 6 represents what is left in the 
scatter of the relation once it is marginalized over all the parameters and thus it can be considered a proxy of the 
intrinsic dispersion under the assumption that all the systematics have been taken into account”. The variable 7 is 
the modeled X-ray monochromatic flux (Fx, moa), defined as: 


p = log Fy, mod = B F (log Fouv + 27.5) + 2(7 E 1) (log DL, moa = 28.5) ’ (26) 


and is dependent upon the data, the redshift and the model (cosmological or parametric) assumed for the distances 
(e.g., ACDM, wCDM or a polynomial function). In the case of a parametric (cosmology independent) approach, the 
data are fitted with a luminosity distance described by a fifth-grade polynomial of log(1+ z), where the cosmographic 
function is: 


5 
Dy, moa(2) = kin(10) 7 Y` a; logi(1 +2) , (27) 
{=l 


where k and a; (a, is fixed to 1 to reproduce the local Hubble law) are free parameters. The polynomial order is 


chosen depending upon the range of redshift spanned by the quasars to ensure convergence (see Bargiacchi et al., 
2021b). 


For any analysis that involves a detailed test of cosmological models, the quasar distances should be cross- 
calibrated by making use of the distance ladder through supernovae Ia. In fact, the DM values of quasars are not 
absolute, thus a cross-calibration parameter (k) is needed. The parameter k should be fitted simultaneously for 
supernovae Ia and quasars (i.e. k is a rigid shift of the quasar Hubble diagram to match the one of supernovae). 


The slope of the Lx — Lyy relation can be kept fixed in the procedure above. Yet, it is better to marginalize over 
y to check whether any degeneracy of the slope with the other parameters is present, and whether the statistical 
significance of any deviation from a cosmological model can be affected by the assumption of a y value that slightly 
deviates from the true one. The marginalization on y is a more conservative procedure, hence it might reduce 
the significance of any observed deviation with respect to the same MCMC analysis with y fixed. Therefore, if a 
statistical deviation persists with respect to a cosmological model even allowing for a variable y, its significance 
should be considered as an indicative lower limit with respect to the case where y is fixed. Finally it should be noted 
that the Hubble constant Ho in Eq. 27 is degenerate with the k parameter, so it can assume any arbitrary value. In 
the following, the Hubble constant is assumed to be fixed to Hp=70 km s-!Mpc ‘(see also Lusso et al., 2019a, 2020; 
Bargiacchi et al., 2021b). 


3.2.2 Sample selection 


To build a quasar sample that can be utilized for cosmological purposes, both X-ray and UV data are required to 
cover the rest-frame 2 keV and 2500 A. The most up-to-date broad-line quasar sample considered for cosmological 
purposes has been assembled by combining seven different samples from both the literature and the public archives 
(Lusso et al., 2020). The former group includes the samples at z ~ 3.0 — 3.3 by Nardini et al. (2019), 4 < z < 7 by 
Salvestrini et al. (2019), z > 6 by Vito et al. (2019), the XMM-XXL North quasar sample published by Menzel et al. 
(2016), and one new optically-selected SDSS quasar at z = 4.109, J074711.14+273903.3, whose X-ray observation 
was obtained as part of a proposed large programme with XMM-Newton (cycle 18, proposal ID: 084497, PI: Lusso). 
This collection is complemented by including quasars from a cross-match of optical (i.e. the Sloan Digital Sky 
Survey) and X-ray public catalogs (i.e. XMM-Newton and Chandra), which will be labeled as SDSS-4XMM and 
SDSS- Chandra samples hereafter (Bisogni et al., 2021). A local subset of active galactic nuclei (AGN) with UV (i.e. 
International Ultraviolet Explorer) data and X-ray archival information was also added to improve the sampling at 
very low redshifts. The reader interested on the description of the different subsets should refer to Lusso et al. (2020). 
The main parent sample is composed by ~19,000 objects, from local up to z = 7.52, where quasars with bright radio 
jets and broad absorption lines (BALs) have been removed. In fact, an excess of X-rays due to synchrotron emission 
is observed in bright radio quasars due to the presence of the jet, whilst the strong absorption features observed in 
BALs, and usually attributed to winds/outflows, hamper a robust measurement of the quasar continuum in the UV. 


5§ = 0 means that all the observed dispersion is intrinsic. 
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Figure 6: Distribution of luminosities at rest-frame 2500 A as a function of redshift for the main (grey points, 
~ 19,000 objects) and the selected (cleaned) samples (Lusso et al., 2020). Brown and yellow squares show the 
high-z sample (Salvestrini et al., 2019; Vito et al., 2019), cyan points the SDSS-4XMM one, brown triangles the 
XMM-XXL one (Menzel et al., 2016), orange pentagons the local AGN sample, red stars the z ~ 3 quasar sample 
(Nardini et al., 2019), green star the new z ~ 4 quasars, and gold pentagons the SDSS-Chandra one (Bisogni et 
al. A&A submitted). Image reproduced with permission from Lusso et al. (2020), copyright by Astronomy & 
Astrophysics. 


To select a sub-sample with accurate estimates of Fx and Fuy, systematic effects should be taken into account 
and low-quality measurements should be neglected. A minimum signal-to-noise (S/N) of 1 on the soft and hard X-ray 
band fluxes should be considered, whilst no such a filter is required in the UV since the S/N at these wavelengths is 
typically significantly higher than 1. The main possible sources of contamination or systematic error that may affect 
the flux measurements are: dust reddening and host-galaxy contamination in the optical/UV, gas absorption in the 
X-rays, and the Eddington bias associated with the flux limit of the X-ray observations. 


Regarding the latter, any flux limited sample is biased towards brighter sources at high redshifts and this should 
be more relevant to the X-rays, since the relative observed flux range is narrower than in the UV. Specifically, AGN 
with an average X-ray intensity close to the flux limit of the observation will be observed only in case of a positive 
fluctuation. This introduces a systematic, redshift-dependent bias towards high fluxes, known as Eddington bias, 
which has the effect to flatten the Fx — Fuy relation. Samples with datasets of only detections might thus be affected 
by such a bias. One possibility is to include censored data in the analysis. Yet, the investigation of both the Fx — Fuy 
and the distance modulus-redshift relations is far from trivial, since it strongly depends on the weights assumed in 
the fitting algorithm. Therefore, one needs to find an alternative method to obtain an (almost) unbiased sample. 


To minimize this bias, one possible approach is to neglect all X-ray detections below a threshold defined as « 
times the intrinsic dispersion of the Fx — Fuy relation (6) computed in narrow redshift intervals (Lusso and Risaliti 
2016; Risaliti and Lusso 2019), specifically: 


log Fy keV, exp — log Fin < KO, (28) 


where Fo keVv,exp is the monochromatic flux at 2 keV expected from the observed rest-frame quasar flux at 2500 A 
with the assumption of a true y of 0.6; it is calculated as follows: 


log Fə kev, exp = (y — 1) log(4m) + (2y — 2) log Dr + ylog Fuv + 8 , (29) 


where Dy is the luminosity distance calculated for each redshift with a fixed cosmology, and the parameter (6 
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represents the pivot point of the non-linear relation in luminosities, 8 = 26.5 — 30.57 ~ 8.2°. The parameter Fmin in 
the Eq. 28 represents the flux limit of a given observation or survey, whilst the product «ô is a value that should be 
estimated for all the sub-samples constructed from archives (e.g., SDSS-4XMM, SDSS- Chandra) or surveys (XXL). 
The Eddington bias is then reduced by including only X-ray detections for which the minimum detectable flux Fmin 
in that given observation is lower than the expected X-ray flux F2kev,exp by a factor that is proportional to the 
dispersion in the Fx — Fyy relation in narrow redshift bins (see Appendix A in Lusso and Risaliti 2016 and Risaliti 
and Lusso 2019). 


A complete description and implementation of these filters to obtain the final best sample for a cosmological 
analysis is presented in Lusso et al. (2020, see their Section 5). The most up-to-date quasar sample is composed by 
2,421 quasars spanning a redshift interval 0.009 < z < 7.52, with a mean (median) redshift of 1.442 (1.295) and it is 
shown in Fig. 6. 


3.2.3 Measurements 


Ideally, spectroscopy can deliver cleaner measurements of the relevant parameters (i.e. the X-ray and UV rest frame 
fluxes), but since a detailed spectroscopic UV and X-ray analysis can be carried out only for a relatively small number 
of sources, the currently published quasar sample also still heavily relies on broadband photometry in both UV and 
X-rays to compute the monochromatic UV and X-ray fluxes, as well as the UV colors and X-ray slopes. These 
parameters are thus derived from the photometric AGN spectral energy distribution (SED). 


To compile the quasar SEDs, multi-wavelength data from radio to UV should be considered, such as the FIRST 
survey in the radio Becker et al. (1995), the Wide-Field Infrared Survey (WISE Wright et al., 2010) in the mid- 
infrared, the Two Micron All Sky Survey (2MASS Cutri et al., 2003; Skrutskie et al., 2006) and the UKIRT Infrared 
Deep Sky Survey (UKIDSS Lawrence et al., 2007) in the near-infrared, SDSS in the optical and the Galaxy Evolution 
Explorer (GALEX Martin et al., 2005) survey in the UV. Most of the relevant broadband information, as well as 
the spectroscopic redshifts, are compiled in the SDSS quasar catalogs. Galactic reddening must be taken into 
account by utilizing the selective attenuation of the stellar continuum k(A) (e.g. Fitzpatrick, 1999), along with the 
relative Galactic extinction (e.g. Schlegel et al., 1998) for each object. For each source, the observed flux and the 
corresponding frequency in all the available bands should be computed. The data used in the SED computation are 
then blue-shifted to the rest-frame (with no K-correction). All the rest-frame luminosities are then determined from a 
first-order polynomial between two adjacent points. At wavelengths bluer than about 1400 A, significant absorption 
by the intergalactic medium (IGM) is expected in the continuum (~10% between the Ly a and CIv emission lines, 
see Lusso et al., 2015, for details). Hence, when computing the relevant parameters, all the rest-frame data at 
A < 1500A should be excluded from the SED (or corrected for such an absorption if possible). 


By compiling a broad photometric coverage, the rest-frame luminosity at 2500 A can be computed via interpolation 
for the majority of the quasars whenever the reference frequency is covered by the photometric SED. Otherwise, 
the value can be extrapolated by considering the slope between the luminosity values at the closest frequencies. 
Uncertainties on monochromatic luminosities (L, x v~7) from the interpolation (extrapolation) between two values 


Lı and Lz are computed as: 
ðL \* aL \* 
— ieee 2 S 2 
ôL = o) (L1)? + & (L2)? . (30) 


To obtain the rest-frame luminosities at 2 keV, a detailed X-ray spectral analysis of all the quasars is impractical, 
given the overall large number of sources, while a photometric approach is a viable solution (Risaliti and Lusso, 
2019; Lusso et al., 2020). Briefly, for sources having an entry in the 4XMM-DR%9 serendipitous source catalog’, the 
rest-frame 2 keV fluxes and the relative (photometric) photon indices, [x (along with their lo uncertainties), can 
be derived from the tabulated 0.5-2 keV (soft, Fs) and 2-12 keV (hard, Fy) fluxes. These band-integrated fluxes are 
blue-shifted to the rest-frame by considering a pivot energy value of 1 keV (Eg) and 3.45 keV (Ep), respectively, and 
by assuming the same photon index used to derive the fluxes in the 4XMM catalog (i.e. Tx = 1.42, Webb et al. 


®We note again that the value of the luminosity normalizations should be chosen based on the average values for the entire sample. 
Thttp://xmm-catalog.irap.omp.eu/ 
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Figure 7: Distance modulus-redshift relation (Hubble diagram) for the clean quasar sample and Type Ia su- 
pernovae (Pantheon, magenta points). Symbol keys are the same as in Figure 6. The red line represents a fifth 
order cosmographic fit of the data, whilst the black points are averages (along with their uncertainties) of the 
distance moduli in narrow (logarithmic) redshift intervals. The dashed black line shows a flat ACDM model fit 
with Qm= 0.3. The bottom panel shows the residuals with respect to the cosmographic fit and the black points 
are the averages of the residuals over the same redshift intervals. Image reproduced with permission from Lusso 
et al. (2020), copyright by Astronomy & Astrophysics. 


2020). For the soft band, the monochromatic flux at Eg is then: 


(2-T'x)Ey ™* 


Fe(Bs) = Fs (2keV)2-Tx — (0.5 keV)? -Tx ’ 


(31) 


in units of erg s7! cm~? keV—!. An equivalent expression holds for the hard band, with the obvious modifications. 


Flux values must be corrected for Galactic absorption. The photometric photon index is then estimated from the 
slope of the power-law connecting the two soft and hard monochromatic fluxes at the rest-frame energies corre- 
sponding to the observed pivot points. The rest-frame photometric 2 keV flux (and its uncertainty) is interpolated 
(or extrapolated) based on such a power-law. A similar approach can be adopted for any X-ray catalog (e.g., the 
Chandra source catalog®, see Bisogni et al. 2021). 


3.2.4 Main results and forecasts 


Quasars have been now extensively used to determine cosmological constraints by fitting their Hubble diagram in 
combination with the one of supernovae Ia, as discussed in Sect. 3.2.1 (e.g. Lopez-Corredoira et al., 2016; Bisogni 
et al., 2017; Lusso et al., 2019b; Melia, 2019; Wei and Melia, 2020; Demianski et al., 2020; Zhao and Xia, 2021; Li 
et al., 2021; Bargiacchi et al., 2021a; Leizerovich et al., 2021). Amongst the main results, it has been found that the 
expansion rate of the Universe based on the combined quasar and supernovae Ia Hubble diagram shows a deviation 
from the concordance model at high redshifts (z > 1.4), with a statistical significance of ~ 3 — 40. Figure 7 presents 
the Hubble diagram for the most up-to-date samples of quasars (Lusso et al., 2020) and Type Ia supernovae from 
the Pantheon survey (Scolnic et al., 2018). The best MCMC cosmographic fit (see Eq. 27) is shown with a red 


Shttps: //cxc.cfa.harvard.edu/csc/ 
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quasars+SNe 


Figure 8: Marginalized posterior distributions (1, 2 and 3c) of the (wo,wa) parameters for the combined quasars 
(Lusso et al., 2020) and supernovae Ia (Scolnic et al., 2018) samples (blue contours). The constraints from the 
combination of Planck TT,TE,EE+lowE+lowl + BAO are also shown (green contours, Planck Collaboration 
et al. 2020a). The dashed lines mark the point corresponding to the ACDM model. The resulting (wo,wa) for the 
combined quasars + SNe are statistically consistent with the phantom regime (w > —1) and at variance with the 
ACDM model at more than the 3ø statistical level. 


Model Om Qae Wo Wa 
flat ACDM 0.295+9-013 - = = 
oACDM 0.514903 1,104995 = - 
flat wo — waCDM | 0.454-02 - -1.3402 —40t37 


Table 3: Summary of the cosmological constraints for the combined quasars (Lusso et al., 2020) and supernovae 
Ia (Scolnic et al., 2018) sample for three different cosmological models: flat ACDM, open ACDM (oACDM) and 
flat wo — waCDM (see Bargiacchi et al., 2021a, for more details) 


line, whilst black points are the means (along with the uncertainty on the mean) of the distance modulus in narrow 
(logarithmic) redshift intervals, plotted for visualization purposes only. The residuals are displayed in the bottom 
panel with the same symbols, and do not reveal any apparent trend with redshift. The MCMC fit assumes uniform 
priors on the parameters (see Bargiacchi et al., 2021b, for more details on the cosmographic technique employed). 


The constraints on wo and Wa in a WoWwWaCDM cosmological model combining the latest quasar and supernovae 
samples are shown in Fig. 8. The constraints from the combination of Planck18 (Planck Collaboration et al., 2020a) 
TT,TE,EE+lowE+lowl + BAO are also shown for reference. The dashed lines mark the point corresponding to the 
flat ACDM model for wọ = —1 and wa = 0. The resulting (wo,w,) for the combined quasars+SNe are statistically 
consistent with the phantom regime (w > —1) and at variance with the ACDM model at more than the 3c statistical 
level. A summary of the cosmological fits to the combined quasar and supernovae samples are presented in Tab. 3. 
The detailed discussion of the cosmological implications of this deviation and its statistical significance is discussed 
at length by Risaliti and Lusso (2019); Lusso et al. (2019a). 
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With currently operating facilities, dedicated observations of well-selected high-z quasars will greatly improve 
the test of the cosmological model and the study of the dispersion of the Lx — Lyy relation, especially at z ~ 4 
and beyond. The extended Roentgen Survey with an Imaging Telescope Array (eROSITA, Predehl 2012; Merloni 
et al. 2012), flagship instrument of the ongoing Russian Spektrum-Roentgen-Gamma (SRG) mission, will represent 
a powerful and versatile X-ray observatory in the next decade. The eROSITA sky will be dominated by the AGN 
population, with ~3 million AGN with a median redshift of z ~ 1 expected by the end of the nominal 4-year all-sky 
survey at the sensitivity of Fọ.s—2key ~ 10714 erg s7! cm7? and for which extensive multi-wavelength follow-ups 
are already planned. Concerning the constraints on the cosmological parameters (such as Qm, Qae, and w) through 
the Hubble diagram of quasars, the 4-year eROSITA all-sky survey alone, complemented by redshift and broadband 
photometric information, will supply the largest quasar sample at z < 2 (average redshift z ~ 1). Nonetheless, a 
relatively small population should survive the Eddington bias cut at higher redshifts (see, e.g., Medvedev et al. 2020 
for the highest redshift radio bright quasar), thus being available for cosmology as eROSITA samples the brighter 
end of the X-ray luminosity function (Lusso, 2020, but see also section 6.2 in Comparat et al. 2020). The large 
number of eROSITA quasars at z ~ 1 will be essential for both a better cross-calibration of the quasar Hubble 
diagram with supernovae and a more robust determination of Qae, which is sensitive to the shape of the low redshift 
part of the distance modulus-redshift relation (see Figure 2 in Lusso, 2020). In the mid and long term, surveys from 
Euclid and LSST in the optical and UV, and Athena in the X-rays, will also provide statistical samples of millions of 
quasars. With these datasets, it will be possible to obtain constraints on the observed deviations from the standard 
cosmological model, which will rival and complement those available from the other cosmological probes. 


3.2.5 Systematic effects 


This method may still have several shortcomings, thus it is mandatory to demonstrate that the observed deviation 
from ACDM at a redshift > 2 is neither driven by systematics in the quasar sample selection nor by the procedure 
adopted to fit the quasar Hubble-Lemaitre diagram. Potential convergence issues may arise from the use of the 
polynomial expansion (Eq. 27) to fit the Hubble diagram when observational data go beyond z ~ 1 (see Bargiacchi 
et al. 2021b for an in-depth discussion). Moreover, the choice of these monochromatic luminosities is rather arbitrary, 
and mostly based on historical reasons. It is possible that the Lx — Lyy relation is tighter with a different choice 
of the indicators of UV and X-ray emission (see e.g. Young et al., 2010). A careful analysis of this issue may also 
provide new insights as to the physical process responsible for this relation. A small fraction of moderate/bright 
radio sources may still be present in the sample. Deep all-sky radio surveys and multi-wavelength approaches (Mingo 
et al., 2016) are necessary to better remove these sources from the clean samples. 


One serious issue that could affect the precision of the flux estimates at X-rays is gas absorption. Previous studies 
based on large AGN surveys show that about 25% of optically selected un-obscured AGN display some levels of X-ray 
absorption (Merloni et al., 2014) in excess of the Galactic value. If not corrected for, this absorption leads to an 
underestimate of the X-ray flux, and an overestimate of the distance. As absorption mostly affects the low energy 
part of the X-ray spectrum, this bias is expected to be more relevant at low redshift (z < 1). Nonetheless, the global 
effect on the Hubble diagram will be a decrease of the ratio between high-redshift and low-redshift distances, i.e., 
qualitatively, this effect may lead a discrepancy with the concordance model. In fact, including AGN with Ty < 1.5 
produces a flattening of the Hubble diagram, as expected if absorbed sources start to contaminate the sample. A 
conservative threshold should thus be Tx > 1.7. Therefore, sources with an X-ray photon index below that value 
are removed from the sample. 


Work still needs to be done regarding the effect of the X-ray and UV variability on the relation (Lusso and Risaliti, 
2016). Variations in the UV brightness are on the order of about 10% (i.e. 0.04 dex in logarithmic units) on time 
scales of months to years (e.g. Vanden Berk et al., 2001). The X-ray variability is on the order of 5% on long time 
scales at high luminosity and somewhat larger at lower luminosity (e.g. Zheng et al., 2017) and it represents about 
30% on the dispersion of the X-ray/UV relation overall (about 0.12 dex compared to the observed 0.24 dex, see Lusso 
and Risaliti, 2016, for details). Moreover, it is well known that the UV and X-ray variability are not correlated on 
short timescales (e.g., NGC5548 Edelson et al. 2015), so the intrinsic variance on the relation could be even lower 
than 0.1 dex. Yet, regarding the X-ray variability, the increase of dispersion due to variability does not modify the 
slope of the relation (Lusso and Risaliti, 2016), even when using simultaneous datasets (Grupe et al., 2010; Wu et al., 
2012; Lusso and Risaliti, 2016). Although in the case of low fluxes, X-ray and UV variability may bias our data 
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towards brighter states, both X-ray and UV variability have the only effect of producing higher uncertainties on the 
final computation of the parameters, without introducing any major systematic. 


Another key issue that could affect the analysis of the distance modulus-redshift relation is the correction for 
the Eddington bias, which flattens the Fx — Fuy relation and thus the Hubble diagram, especially at high redshifts. 
At present, such a correction is at the expenses of the sample statistics. Depending on the flux limit of the given 
observation/survey, the sample statistics of the parent sample may drop by more than 50%. Additionally, the 
assumption that the true slope of the Fx — Fuy is y = 0.6 may leave some hidden trends in the residuals of the 
Hubble diagram as a function of redshift. Nonetheless, the analysis of the residuals of the Hubble diagram as a 
function of redshift and y for different values of the threshold xô does not show any obvious trend (see Sect. 9.1 by 
Lusso et al. 2020 and appendix A in Lusso and Risaliti 2016). 


The presence of an additional contribution of dust reddening in the UV band should be considered amongst 
the possible residual (and redshift-dependent) observational systematics in the Hubble diagram. Going to higher 
redshifts, the rest-frame optical/UV spectra shift to higher (shorter) frequencies (wavelengths), where the dust 
absorption cross-section is higher. This might underestimate Fyy measurements, which would imply an intrinsically 
larger value of the luminosity distance (and thus the distance modulus) than the measured one (see Section 9.4 by 
Lusso et al. 2020 for details). 
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3.3 Gamma-Ray Bursts 


Observations of SNe Ia obtained at the end of 90’s by two different teams (Perlmutter et al., 1998, 1999; Riess et al., 
1998; Schmidt et al., 1998) found that starting from z ~ 0.5 SNe Ia appeared dimmer by ~0.25 mag. Given the nature 
of standard candles of SNe Ia (Phillips, 1993) this result suggested that we are living in an Universe characterized 
by an accelerated expansion. In the following decades, other cosmological probes (e.g. CMB and BAO) provided 
further support to the existence of an unknown form of “dark energy” propelling the acceleration. By combining 
SNe data with the constraints from CMB measurements, several groups (e.g., Riess et al., 2004) found wọ ~ —1 
and wa ~ 0. This result might identify the dark energy as originated from a genuine cosmological constant. In 
subsequent years, new SN surveys have shown that the Hubble diagram does not exploit the growing number of SN 
discoveries (Fig. 9) in terms of the accuracy of cosmological parameter measurements. This is likely due to the fact 
that SN observations are affected by numerous sources of systematic effects, such as different classes of progenitor 
systems and different explosion mechanisms, anomalous reddening law, contamination of the Hubble diagram by 
non standard SNe Ia and/or bright SNe Ibc. Taking advantage from the existence of this “systematic wall” some 
authors (e.g., Nielsen et al., 2016) have questioned, on statistical basis, the evidence for cosmic acceleration from SNe 
Ia. In fact, SNe Ia detected in the Supernova Legacy Survey (e.g., Astier et al., 2006; Guy et al., 2010) confirm the 
acceleration, although their measurements suggest different values for the cosmological parameters. The cosmological 
interpretation of SN Ia peaks decreased by 0.25 mag is based on the lack of evolutionary effects of their progenitors. 
In the following, we describe three methods, based on gamma-ray bursts, to measure Q,,independently of SNe Ia, 
and to constrain the dark energy EoS aimed at describing the expansion history of the Universe. 


3.3.1 Basic idea and equations 


Gamma-ray bursts (GRBs) are detectable up to the first hundred millions years after the big-bang thanks to the 
enormous energy that they release in the X/gamma-rays (the isotropic radiated energy, Eiso, can reach œ 1054 erg), 
and their redshift distribution ranging from z œ 0.01 up to z ~ 9. Therefore, these phenomena are very promising 
probes for investigating the history and evolution of the Universe, understanding the nature and evolution of dark 
energy, and testing alternative cosmological models. For recent general reviews on the GRB phenomenon, we refer to 
Mészáros (2002); Zhang (2014); Kumar and Zhang (2015); Pe’er (2015). Although GRBs are not standard candles, 
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Figure 9: Residual distance modulus for different values of the density cosmological parameters up to z = 2.0. We 
consider the best fit to be the standard ACDM model, where Qm=0.27, Q4=0.73, and 6=71 km s~+Mpc7' (black 
line). Union2 SNe Ia data residuals are shown in grey. The large spread (more than 1 mag) shown by p at z = 
1.5 and at z = 0.145 (the two vertical dashed lines) where the scatter is almost 0.2 mag is clearly evident. Image 
reproduced with permission from Izzo et al. (2015), copyright by ESO. 
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Correlation Reference 
Epi Eiso Amati et al. (2002) 
Epi Ey Ghirlanda et al. (2004) 
pi Liso Yonetoku et al. (2004) 
Lpeak — Tiag Azzam (2012) 
Liso — V Fenimore and Ramirez-Ruiz (2000) 
Liso — Ep,i — To.45 Firmani et al. (2006) 
Liso — Ep,i — tbreak Liang and Zhang (2005) 
Ly -Ta Dainotti et al. (2008) 
EX iso — Ey iso — Epk Bernardini et al. (2012) 
Ey iso = EX tso a= Epk Izzo et al. (2015) 


Table 4: List of the most investigated GRB correlations. 


as their peak luminosity and radiated energy span several orders of magnitude, some empirical correlations between 
distance-dependent quantities and rest-frame observables have opened up the possibility of using GRBs as distance 
indicators (see, for instance, Amati et al., 2008; Amati and Della Valle, 2013; Lin et al., 2015, 2016a,b; Wei and 
Wu, 2017; Si et al., 2018; Fana Dirirsa et al., 2019; Khadka and Ratra, 2020; Zhao et al., 2020). Actually, from a 
phenomenological point of view GRBs show a prompt emission, consisting of y-rays and hard X-rays high-energy 
photons, and an afterglow emission, which is a long-lasting multi-wavelength emission from X-ray, to infrared and 
sometimes also radio, which follows the prompt emission and shows a typical power-law decay (e.g., Gehrels et al., 
2009). In addition, GRBs can be generally classified into short (with duration Tyo < 2s, SGRBs) and long (with 
To9 > 2s, LGRBs Kouveliotou et al. 1993), where Tyo is the time interval in which 90% of the GRB burst fluence 
is accumulated, starting from the time at which 5% of the total fluence was detected. The classification is very 
important for standardizing GRBs since most of these correlations hold for long GRBs only. In Tab. 4 we list some 
of the correlations widely investigated in the literature, based on both prompt and afterglow emission properties (see 
references above for the definitions of the parameters mentioned in the Table). 


Throughout this Section, we mostly focus on the E,;—Ejso correlation for measuring cosmological parameters 
and investigating dark energy properties and evolution. In addition, as an example of the potentiality of combining 
prompt and afterglow emission properties, we will also discuss the perspectives for cosmology of the so called Combo- 
relation (Izzo et al., 2015), obtained by combining the Ey iso- Ex,iso-Ep,i, the Epi ~E- correlations, and the analytical 
formulation of the X-ray afterglow component given in (Ruffini et al., 2014). 


3.3.1.1 The Ep, i-Eiso (“Amati”) correlation 


GRBs show non thermal spectra which can be empirically modeled with the Band function (Band et al., 1993), which 
is a smoothly broken power law with parameters a, the low-energy spectral index, 8, the high energy spectral index 
and the roll-over energy Eo: 


N(B) A (mr) exp (-£) (@-8)Eo 2E , 
= ae 
a(S) exp (a B) la) (a—B)Ey<E. 


Given that 6 is almost always found to be <-2, GRB vFv spectra show a peak corresponding to a value of the 
photon energy Ep = Eo(2+ a) (Fig. 10), ranging typically from ~5-10 up to 1000-5000 keV (see, e.g., Zhang, 2014). 
For those GRBs with well measured prompt emission spectrum and redshift, it is possible to evaluate the “intrinsic” 
(i.e., in the cosmological rest-frame) peak energy, Ep; = E (1+ z) and the isotropic-equivalent radiated energy, 


defined as: 
104 /(1+z) 
Eiso = 4r D? (z, 0) (1 + J EN(E)dE . 
1/(1+z) 


(32) 
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Figure 10: A typical vF, spectrum of a GRB. 


The quantity Eiso spans several orders of magnitude, typically ranging from 105° to 1054 erg. It is important to 
note that, while there are observational and theoretical evidences suggesting that the GRB emission is collimated 
within a few tens of degrees or less, we are still lacking a firm and reliable method for estimating the jet opening 
angle of single GRBs. This is why, conservatively, Eiso, or the isotropic-equivalent peak luminosity, Liso, are still 
used as indicators of the GRB “brightness”. 


The existence of a strong correlation between Fp, and Fiso of long GRBs was inferred more than 20 years ago 
based on the systematic analysis of GRB spectra and fluences (Lloyd et al., 2000), and was actually discovered in 2002 
(Amati et al., 2002) based on the first sample of BepppoSAX GRBs with measured redshift. The Ep, Eiso (“Amati”) 
correlation was then confirmed by later measurements by several different GRB detectors and can be modeled as a 
linear relation between the logarithms of the two quantities: 


E i Eiso 


The Ep ;—E;s. correlation (see Fig. 11) is characterized by an intrinsic additional extra-Poissonian scatter, Cint, 
around the best-fit line that has to be taken into account and determined together with (a, b) by the fitting procedure. 
A commonly used method is the maximization of the likelihood implemented by Reichart et al. (2001), specifically 
developed for fitting data that are affected by extrinsic scatter in addition to the intrinsic uncertainties along both 
axes: 
1X log (oint to, + eee), 1 (yi — axi — b}? 

2 log (1 + a?) 2L o? +02 +0702” 
Here the sum is over the N objects in the sample. We note that this maximization can actually be performed in the 
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two-parameter space (a, Cint) only, since b may be calculated analytically by solving the equation 
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Figure 11: The Ep,i—Eiso correlation for long GRBs based on the updates sample of 208 events used for this 
review. Blue points indicate GRBs detected and localized by the Swift satellite. 


The values of the normalization, slope and intrinsic dispersion of the Ep, Eiso correlation in the logarithmic 
form expressed above are found to be ~2, ~0.5 and ~0.2 dex, respectively, with slight variations depending on the 
sub-sample considered (e.g., Amati et al., 2002; Ghirlanda et al., 2004; Amati, 2006; Amati et al., 2008; Amati and 
Della Valle, 2013; Demianski et al., 2017). 


The existence and properties of this correlation have been widely investigated by many research groups in the 
last twenty years, because of its key role for the understanding of the GRB prompt emission physics, jet structure 
and geometry and viewing angle effects, as well as for the identification and nature of different sub-classes of these 
events, such as: short vs. long, X-Ray Flashes and under-luminous GRBs, ultra-long GRBs, etc. (see, e.g., Zhang 
and Mészáros, 2002; Amati, 2006; Zhang, 2014; Kumar and Zhang, 2015; Pe’er, 2015). 


3.3.1.2 Independent measurements of cosmological parameters through the Ep, i—Eiso correlation of GRBs 


The “Amati” relation becomes a distance indicator through the measurement of Fis, that is derived from the ob- 
served fluence, which in turns depends on the geometry and expansion rate of our Universe through the so-called 
luminosity distance. Unlike historical “standardized” candles as SNe Ia that can be calibrated via Cepheids (e.g., 
Riess et al., 2021a), we don’t have a statistically significant sample of GRBs at low redshift allowing us to determine 
the parameters of the correlation in a cosmology-independent way. Which means that the existence and properties 
of the correlation were found by assuming a fiducial cosmological model. Thus, if we wish to use it for measuring 
cosmological parameters we are obviously affected by a circularity problem. The most straight way to get rid of it is 
to simultaneously constrain the calibration parameters (a, b, Cint) and the set of cosmological parameters by consid- 
ering a chosen likelihood function. In practice, this task consists in determining the multi-dimensional probability 
distribution function (PDF) of the parameters {a, b, Cint, p}, where p is the N-dimensional vector of the cosmological 
parameters. 


This is the method adopted by Amati et al. (2008) in the first work aimed at verifying if the Ep ;—E,,. correlation 
could actually be used for cosmology. By assuming a flat ACDM cosmology it was found that, actually, the goodness 
of fit of the correlation varied as a function of Qm following a nice parabolic shape with a minimum at about 0.2-0.3, 
as shown in Fig. 12. The analysis performed on larger samples in the following years made this result more and 
more reliable and accurate (see, e.g., Amati and Della Valle, 2013), showing that GRBs provide - in the framework 


33 


—log(likelihood) 
o © y x o 
a o nN Əd W 


| 
© 
ro) 


2 
© 


0.2 0.4 0.6 0.8 


pä 
ie) 


Figure 12: Goodness of fit (in terms of — log(likelihood) of the E,,i—Eiso correlation of long GRBs (based on the 
updates sample of 208 events used for this review) as a function of the value of Qm assumed in the computation 
of the Eiso values by assuming a flat ACDM cosmology. 


of ACDM cosmology - a firm and independent evidence for the case of an accelerating Universe with Qm~0.3. This 
result is further confirmed by releasing the flat universe assumption, i.e., by letting both Qm and Qa free to vary 
(see next sections). 


3.3.1.3 Calibrating the Ep,i~Eiso correlation with SNe la and other probes 


In addition to the clean and independent approach described above, different and alternative techniques for getting rid 
of the circularity issue when using the E, i—Fiso (Liso) correlation for cosmology have been developed and presented 
in literature (see for instance Montiel et al., 2021; Amati et al., 2019; Muccino et al., 2021; Izzo et al., 2015; Wang 
et al., 2015; Liang et al., 2008; Kodama et al., 2008; Wei, 2010; Lin et al., 2015). 


As anticipated, most of these methods use SNe Ia for calibrating the correlation for those GRBs at redshift lower 
than about 1.5 using the luminosity distances derived from SNe Ia (see Kodama et al., 2008; Liang et al., 2008; 
Demianski et al., 2017). In this way, more accurate estimates of cosmological parameters can be obtained, at the 
price of making GRBs a no-more-completely independent cosmological probe. 


The typical regression procedure adopted in these approach can be schematically sketched as follows: 


1. set the redshift range where the modulus of distance, u(z), has to be reconstructed; 


2. sort the SNe Ia sample by increasing value of |z — z;| and select the first n = aNgneta, where a is a user-selected 
value and Ngneta the total number of SNe Ia; 


3. apply the weight function 
(1—|ul?)? ul <1 
W(u) = (36) 
0 jul >1 
where u = |z — z;|/A and A is the highest value of the |z — z;| over the previously selected subset; 


4. fit a first-order polynomial to the data previously selected and weighted, and use the zeroth-order term as the 
best-fit value of the modulus of distance p(z); 
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Figure 13: GRB Hubble diagram build up by calibrating the E,;—-Eis. correlation for the updated sample of 
208 GRBs used for this review. 


5. evaluate the error g, as the root mean square of the weighted residuals with respect to the best-fit value. 


Therefore we use the reconstructed p(z) to fit the Ep ;—Ejs. correlation relation. We considered only GRBs with 
z < 1.414 to cover the same redshift range as is spanned by the SNe Ia data. To standardize the Fp ;—Ejso relation 
as expressed by Eq. 33, we need to fit a data array {z;, yi} with uncertainties {o7,;,0,;}, to a straight line and 
determine the parameters (a, b). 


After estimating the parameters of the correlations, we use them to construct the GRB Hubble diagram. Actually: 


Eiso(1 ot a 
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The uncertainty of Dz (z) was estimated through the propagation of the measurement errors of the pertinent quan- 
tities. It turns out that 


5 Epi 
log D = b+al Ba log (At Spoto 
5log Dz (z) G) { alog E od og (4T Shol ) +10} i (38) 


where uo is a normalization parameter, due to the fact that the distance moduli of GRBs are not absolute; thus, this 
cross-calibration parameter is needed to match the GRB Hubble diagram and the one of SNe Ia (see for instance 
Demianski et al., 2021). In Fig. 13 we plot the GRB Hubble diagram obtained for a new sample of 212 objects. 


3.3.2 Measurements and sample selection 


The use of GRBs for measuring cosmological parameters through the Ep —~ĒEiso correlation, or other correlations 
involving the spectral peak energy Ep, (see, e.g., Tab. 4), requires i) the measurement of the redshift, through either 
absorption spectroscopy of the optical/NIR afterglow spectrum or emission line spectroscopy of the host galaxy, and 
ii) the measurement of the prompt emission spectrum over a broad energy band and for most of the duration of 
the event, to allow an accurate characterization of the spectral continuum curvature. These combined requirements 
reduce the size of the sample from the several thousands of GRBs detected since the ’70s to less than three hundreds 
nowadays. For instance, while GRB broad band spectroscopy from 10-20 keV up to a few MeVs was already available 
in the ’80s and ’90s, thanks, e.g., to CG@RO/BATSE and Konus-WIND GRB detectors, it was possible to discover 
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GRB afterglow emission and hence get the first redshift measurements only in the late ’90s. On the other hand, the 
Swift mission, while providing great and fast localization of GRB prompt and afterglow emissions, thus substantially 
improving the efficiency in the follow-up process leading to redshift determination, is limited by the narrow energy 
band (15-150 keV) of its GRB detector. 


The samples used up to now for this line of investigation (Amati et al., 2008; Amati and Della Valle, 2013; 
Demianski et al., 2017; Amati et al., 2019; Demianski et al., 2021, e.g.,) include GRBs with measured redshift for 
which detection, localization, and spectral measurements come from the following main GRB missions: BATSE, 
BeppoSAX, HETE-2, Konus—WIND, Fermi/GBM, Swift/BAT. For this work, we consider a slightly updated sample 
wit respect to that used by Amati et al. (2019); Demianski et al. (2021), comprising a total number of 208 GRBs. 
This update is based on events for which redshift and spectral measurements became available in 2017 and 2018. 
A substantially updated sample including data form very recent Konus-WIND, Konus-WIND + Swift/BAT and 
Fermi/GBM spectral catalogs will be presented and analyzed in Amati et al. (in prep.), as well as in the next version 
of this review. 


As discussed, e.g., in Demianski et al. (2017) and Demianski et al. (2021), the criteria behind selecting the mea- 
surements from a particular mission are based on objective conditions aimed at minimizing selection and systematic 
effects (see also Sect. 3.3.3): 


e given the broad energy band and good calibration, spectral measurements by Konus-WIND and Fermi/GBM 
are preferably chosen whenever available. The SWIFT BAT observations were chosen when no other preferred 
mission (Konus-WIND, Fermi/GBM) was able to provide information. They were considered only for GRBs 
with the observed value of Ep, within the energy band of the instrument. 


e in order to minimize biases due to event spectral evolution,and hence possible systematics on Epi, only GRBs 
for which the exposure time was at least 2/3 of the whole event duration are selected (this condition is satisfied 
by about 80% of the publicly available spectral catalogs); 


those GRBs usually classified as “under-luminous events’, for which there is significant possibility that their 
radiated energy, luminosity and spectral parameters are strongly biased by off-axis viewing effects or very 
long-to-soft spectral evolution (see, e.g., Amati, 2006; Martone et al., 2017), as well as being a different class 
of events with respect to classical cosmological long GRBs, are not included in the sample. 


In the estimates of Ep; and Eiso, the values and uncertainties of all the observations are taken into account. 
When the observations were to be included in the data sample, it has been checked that the uncertainty on any value 
is not below 10 per cent in order to account for the instrumental capabilities. When the error was lower, it has been 
assumed to be 10%, which is a reliable level of accuracy in the calibration of these kind of detectors. When available, 
the Band model (Band et al., 1993) was considered since the cut-off power-law tends overestimate the value of E, j. 


3.3.3 Systematic effects 


Given their relevance for shedding light on the emission processes, on the jet properties (e.g., structure, degree of 
magnetization), on the identification and understanding of different sub-classes of GRBs (long, short, under-luminous, 
ultra-long, GRB-SN connection), and as well for their great potential for GRB cosmology, the E, ;—Ejs, and other 
main correlations involving prompt and afterglow emission properties have been subject of many tough investigations 
aimed at identifying, understanding, and overcoming, possible selection effects and systematic (see, e.g., Dainotti 
and Amati 2018 for an exhaustive review). 


Reliability of the Ep ;—Ejs. correlation. Different GRB detectors are characterized by different thresholds and 
spectroscopic sensitivity, therefore they can spread relevant selection effects and biases in the observed Ep,i Eiso cor- 
relation. In the past, there were claims that a high fraction (70-90%) of BATSE GRBs without redshift would be 
inconsistent with the correlation for any redshift (Band and Preece, 2005; Nakar and Piran, 2005). However, this 
“peculiar” conclusion was refuted by other authors (Ghirlanda et al., 2005; Bosnjak et al., 2008; Ghirlanda et al., 
2008; Nava et al., 2011) who show that, in fact, most BATSE GRBs with unknown redshift were well consistent with 
the Ep,i~Eiso correlation. We also note that the inconsistency of such a high percentage of GRBs of unknown redshift 
would have implied that most GRBs with known redshift should also be inconsistent with the Ey ;—Fiso relation, and 
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this fact was never observed. Moreover, Amati et al. (2009) showed that the normalization of the correlation varies 
only marginally using GRBs measured by individual instruments with different sensitivities and energy bands, while 
Ghirlanda et al. (2010) show that the parameters of the correlations (m and q) are independent of redshift. 


Furthermore, the Swift satellite, thanks to its capability of providing quick and accurate localization of GRBs, 
thus reducing the selection effects in the observational chain leading to the estimate of GRB redshift, has further 
confirmed the reliability of the Ep ;—Fiso correlation (Amati et al., 2009; Ghirlanda et al., 2010; Sakamoto et al., 
2011). 


Finally, based on time-resolved analysis of BATSE, BeppoSAX, and Fermi GRBs, it was found that the EF) j— 
Eiso correlation also holds within each single GRB with normalization and slope consistent with those obtained 
with time-averaged spectra and energetic/luminosity (Ghirlanda et al., 2010; Lu et al., 2012; Frontera et al., 2012; 
Basak and Rao, 2013). This ultimate test confirms the physical origin of the correlation, also providing clues to its 
explanation. 


Possible evolutionary effects of the FE, ;—Ejs. correlation. Possible evolutionary effects that may affect the 
correlation and have been investigated by several authors. By dividing the GRB sample into subsets with different 
redshift ranges (e.g., 0.1 < z < 1,1 <z < 2, etc.), it is found that slope, normalization, and dispersion of the 
correlation do not change significantly. This result also implies that Malmquist-—like selection effects are negligible. 


In any case, to take into account possible evolutionary effects due, for instance, to the effects of local inhomo- 
geneities distribution along the GRB line of sight (see, for instance, Shirokov et al., 2020; Demianski et al., 2021), 
it is also possible to consider a sort of extended Ep i~Fiso correlation, introducing terms representing the redshift 


A ; , Eiso 
evolution, in the form of power-law functions: giso(z) = (1 + 2)**? and Op(z) = (1+ z)", so that Eg = E and 
Giso\~ 
1 E ai re : 
Epi = BG) are the new fitting quantities (see also Shirokov et al., 2020; Demianski et al., 2021). In this approach, 
1 gpl 
we consider a correlation with three parameters a, b, and kiso — akp: 
Ej Ep 
l aE l Pa H (kiso — akp) log (1 ‘ 
og || b+ alog | oe | ( akp) log (1 + z) (39) 
The redshift dependence term in Eq. 39 can be expressed by a single average coefficient y: 
Ei; Epi 
l = | =b+al 2I | +ylog(1 ; 40 
oe É a me E | vee) a) 


To calibrate this 3D relation we have to fit the coefficients a, b, y, and the intrinsic scatter Cint. It turns out that 
low values of y would indicate negligible evolutionary effects. Therefore it is possible to consider a 3D Reichart 
likelihood, which is: 


1 J` log (o?,, + Ty +a?o2.) gal D (yi — axi — yzi — b)? 
2 log (1 + a?) Tint + OR, + 0703, 
This likelihood can be maximized with respect to a and y, since b can be evaluated analytically by solving the 
equation: 


ees Y b, Cint) = (41) 


o 
5p Paithan (4 kiso, Q, b, Tint) =0. (42) 
Actually, it turns out that: 
—1 
Yi — AX, — YR 1 
b= . 43 
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3.3.4 Main results and forecasts 


In this section we show and discuss the current and perspective potentiality of the three methods described above for 
using GRBs as probes of the expansion rate and geometry of the Universe. The main results and forecasts reported 
are based on the partially updated sample of 208 GRBs described above and a sample of 208 real + 292 simulated 
GRBs which may be expected from future dedicated space missions, as described below (giving a sample of 500 
GRBs in total), respectively. The latter sample was produced following the procedure and assumptions detailed in 
Amati and Della Valle (2013). 
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3.3.4.1 GRBs as independent probes 


In Tab. 5, we show the 68% confidence level intervals for Qm and wo in a flat FLRW universe derived with the 70 
GRBs of Amati et al. (2008), the partially updated sample of 208 GRBs and the partially simulated sample of 500 
GRBs These values were obtained with the same approach as Amati et al. (2008), but using the likelihood function 
proposed by (Reichart et al., 2001), which has the advantage of not requiring the arbitrary choice of an independent 
variable among Ey; and Eiso. Interesting enough, we note that, after increasing the number of GRBs from 70 to 
156, the accuracy of the estimate of Qm improves by a factor of ~ y/N2/Nı. The accuracy of these measurements 
is still lower than that obtained with supernova data, but promising in view of the increasing number of GRBs with 
measured redshift and spectra (see also Fig. 12, Fig. 14, and Sect. 3.3.4.2). 


In the last 3 lines of Tab. 5, we report the estimates of Qm and wo derived from the present and expected future 
samples by assuming that the E,;—Ejs. correlation is calibrated with a 10% accuracy by using, e.g., the luminosity 
distances provided by SNe Ia, GRBs self-calibration, or the other methods shortly described below. The perspectives 
of this method for improving estimates of Qmand the investigation of the properties of dark energy, combined with 
the expected increase of the number of GRBs in the sample, are shown in Fig. 14. In particular, as an example, we 
are showing the current and expected accuracy on wọ in case of an evolving dark energy with w,~0.5. 


It is important to note that, as the number of GRBs in each z-bin increases, also the feasibility and accuracy of 
the self-calibration of the E,;—Ejso correlation will improve. Thus, the expected results shown in the last part of 
Tab. 5 and in Fig. 14 may be obtained even without the need of calibrating GRBs against other cosmological probes. 


The results presented in Tab. 5 show a sharp increase of the accuracy of Qm as a consequence of the increasing 
number of GRBs in the Ep ;—Ejso plane. Currently, the main contribution to enlarge the GRB sample comes from 
joint detections by Swift, Fermi/GBM or Konus-WIND. Hopefully, these missions will continue to operate in the 
next years, then providing us with an “actual” rate of ~15-20 GRB/year. However, a real breakthrough in this field 
should come from next generation missions capable of promptly pinpointing the GRB localization and of carrying 
out broad-band spectroscopy. We build our hopes on the Chinese-French mission SVOM (Bertrand et al., 2019), for 
the very near future, and on mission concepts like THESEUS (Amati et al., 2018) for the next decade. 


In Fig. 14 we show the confidence level contours in the Qm-Qae and Qm-wo planes by using the real data, and by 
adding to them the 292 simulated GRBs (resulting a sample of 500 GRBs in total, respectively). The simulated data 
set was obtained via Monte Carlo techniques by taking into account the slope, normalization and dispersion of the 
observed EF), ;—Eiso correlation, the observed redshift distribution of GRBs and the distribution of the uncertainties 
in the measured values of Ey; and Eiso. These simulations indicate that with a sample of 500 GRBs (achievable 
within a few years from now) the accuracy in measuring Om will be comparable to that currently provided by SNe 
data. 


3.3.4.2 Use of GRBs calibrated against SNe la 


To test different cosmological models, we use a Bayesian approach based on the MCMC method. In order to set 
the starting points for our chains, we first performed a preliminary and standard fitting procedure to maximize the 


Number of GRBs Om Wo 
(flat) (flat, Qm=0.3,wWa=0.5) 
70 (real) GRBs (Amati et al., 2008) 27 <—0.3 (90%) 
208 (real) GRBs (this work) 0.26+9:23 2 
500 (208 real + 292 simulated) GRBs 0.294920 —0.9+9:2 
208 (real) GRBs, calibration 0.304006 i. 
500 (208 real + 292 simulated) GRBs, calibration 0.307333 a 


Table 5: Comparison of the 68% confidence intervals on Qm and wo (Qm=0.3, wa=0.5) for a flat FLRW universe 
obtained with the sample of 70 GRBs Amati et al. (2008), the updated sample of 208 GRBs considered in this 
work and simulated sample of 500 GRBs (see text). In the last two lines we also show the results obtained for 
the same samples by assuming that the slope and normalization of the Ep,i—Eiso correlation are known with a 
10% accuracy based, e.g., on calibration against SNe Ia or self-calibration with a large enough number of GRBs 
at similar redshift. 
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Figure 14: Left: 68% confidence level contour in the Qm-Qae plane obtained by releasing the flat universe 
assumption with the sample of 208 GRBs considered in this work (red contour) compared to those obtained with 
a sub-sample of 120 GRBs and what expected in the next years with the increasing of GRBs in the sample (500 
GRBs, blue). Right: 68% confidence level contour in the wo-Qm plane for a flat FLRW universe with Qm=0.3 
obtained for the same samples as for the left panel. As for the results and simulations reported in Tab. 5, for the 
dark energy equation of state wa = 0.5 was assumed. 


likelihood function £(p). We sample the space of parameters by running five parallel chains and use the Gelman- 
Rubin diagnostic approach to test the convergence. As a test probe, it uses the reduction factor R, which is the 
square root of the ratio of the variance between-chains and the variance within-chain. A large R indicates that the 
between-chains variance is substantially greater than the within-chain variance, so that a longer simulation is needed. 
We require that R converges to 1 for each parameter. We set R — 1 of order 0.05, which is more restrictive than 
the often used and recommended value R — 1 < 0.1 for standard cosmological investigations. We discarded the first 
30% of the point iterations at the beginning of any MCMC run, and thinned the chains that were run many times. 
We finally extracted the constrains on cosmological parameters by co-adding the thinned chains. The histograms of 
the parameters from the merged chains were then used to infer median values and confidence ranges. As a simple 
example, let us consider the CPL parameterization of the dark energy EoS described in Eq. 9. In Fig. 15 we plot the 
2D confidence regions in the wo — Wa plane for the CPL model, obtained from real (upper panel) and a simulated 
(bottom panel) GRBs Hubble diagram. 


We join our sample of 208 GRBs to a simulated sample of 792 objects. These simulated data have been obtained 
by implementing a Monte Carlo approach and taking into account the slope, normalization, dispersion of the observed 
Epi-Fiso correlation. It is worth noting that the ACDM model, which in the CPL parameterization corresponds to 
wo = —1 and wa = 0, is disfavoured with respect to a dynamical model of dark energy. 


3.3.4.3 The “Combo” relation: shedding light on the evolution of dark energy 


As discussed by Izzo et al. (2015) and Muccino et al. (2021), an important step forward in this line of investigation 
may be provided by the use of the “Combo” relation, which extends the “Amati” relation through the inclusion of 
X-ray afterglow observables like the initial luminosity, the rest-frame duration of the shallow phase, and the index 
of the late power-law decay, combined with an innovative calibration method minimizing the dependence on the 
systematics possibly affecting SNe Ia. The main novelty provided with the Combo relation consists in the afterglow 
X-ray light-curve fitting procedure through a piece-wise function, first introduced by Willingale et al. (2007), that is 
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Figure 15: 2D confidence regions in the wo — Wa plane for the CPL model, obtained from a simulated (right 
panel) and real (left panel) GRBs Hubble diagram. 


capable to model the very early power-law decay and the following “plateau” emission (Izzo et al., 2015), getting rid 
of X-ray flaring emission over-imposed to the underlying afterglow behavior (Zaninoni et al., 2014). This procedure, 
similar to the analysis currently developed for SNe Ia, allows to measure with great accuracy the main observables of 
the Combo relation: indeed, among the entire sample of Swift long GRBs showing a complete light curve in X-ray, 
and characterized by a known peak energy of the corresponding prompt emission, no outliers have been found so far 
(Muccino et al., 2021; Xu et al., 2021; Wang et al., 2021). 


In a preliminary analysis on a sample of 60 GRBs with well measured parameters of both prompt and early 
X-ray afterglow emission, Izzo et al. (2015) showed that actually the Combo relation could provide a value of Qm= 
0.29+9:23 By applying the Combo relation to an updated sample of 174 gamma-ray bursts, Muccino et al. (2021) 
could obtain tighter bounds on Om, and investigate the possible evidence of evolving dark energy parameter w(z). 
As shown in Fig. 16, the w(z) evolution was studied by binning the GRB Hubble diagram in seven redshift intervals 
and assuming two priors over the Hubble constant in tension at 4.40, i.e., HO = (67.4+ 0.5) km s~!Mpc~ ‘and H0 = 
(74.03 + 1.42) km s~!Mpc™". It was found that at z < 1.2 w(z) agrees within 1o with the standard value w = —1, 
whereas at larger z the w(z) estimated from GRBs seem to deviate from w = —1 at 20 and 4o level, depending 
on the redshift bins (Fig. 16). These results indicate that dark energy’s influence is not negligible at larger z, and 
confirm the Combo relation as a powerful tool to investigate cosmological evolution of dark energy. 


In view of the increasing size of the GRB database, thanks to future missions, the Combo-relation is a promising 
tool for measuring Qm with an accuracy comparable to that exhibited by SNe Ia, and to investigate a possible 
evolution of the dark energy up to z ~ 10. 


3.3.4.4 The promises of correlations involving Lx and Ta 


As discussed at the beginning of this section, and shown in Tab. 4, the quest for correlations between GRB properties, 
aimed at shedding light on the emission processes and at enabling the use of these phenomena for measuring cosmo- 
logical parameters, involved not only the X/Gamma-ray prompt phase but also the early X-ray afterglow emission. 
Among these, the most investigated are those involving the duration of the “plateau” phase, Tọ and the luminosity 
at the end of this phase, usually referred to as Ly. Indeed, as shown and discussed by several authors (e.g. Dainotti 
et al. 2020 and references therein, Hu et al. 2021), there exists a significant correlation between these two quantities, 
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Figure 16: The DE EoS reconstructed evolution through the redshift-binned parameterization of w(z) (1 and 
2o from the inner/darker to the outer/lighter) for the selected Ho. The dashed red lines mark the value w = —1 
in the flat ACDM model. The darker region shows un-physical EoS, i.e., exceeding the stiff matter regime. Image 
reproduced with permission from Muccino et al. (2021), copyright by the American Astronomical Society. 


as well as a 3D correlation obtained by including the peak luminosity of the prompt emission, Lp. In particular, it 
has been found that these correlations become tight for sub-samples selected based on other characteristics, including 
the nature of the progenitor and multi-wavelength properties. This method, while still affected by the relatively low 
number of events that can be used for each sub-sample and sample selection effects, seems promising for the purpose 
of GRB cosmology, especially in view of the wealth of new data on GRB prompt and afterglow emission expected 
in the near future thanks to the continuing operation of Swift, Fermi, Konus-WIND and other GRB experiments, as 
well as increased efficiency of follow-up with ground facilities. 
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3.4 Standard Sirens 


As first pointed out by Schutz (1986), merging black holes and neutron stars, when observed in gravitational waves 
(GWs), can serve as powerful cosmological probes. These merging binaries emit GW signals that directly encode 
the luminosity distance to the binary Du. The growing number of GW observations by LIGO (LIGO Scientific 
Collaboration et al., 2015), Virgo (Acernese et al., 2015) and Kagra (Akutsu et al., 2020) provides a cosmological 
distance catalog, requiring no calibration other than general relativity. So far, most standard siren measurements 
have relied on the closest standard sirens with luminosity distances Dz, < 400 Mpc, so that they probe primarily 
the local distance-redshift relation through the Hubble constant Ho. However, these analyses are starting to take 
advantage of the full gravitational-wave sample, which extends to Dz = 5 Gpc with the current LIGO and Virgo 
detections, and will extend past 10 Gpc with upgrades to the gravitational-wave detector network over the next 
few years. Standard sirens are therefore starting to provide measurements of the expansion history out to z > 1 in 
addition to measuring the Hubble constant. Furthermore, standard sirens are unique probes of modified gravitational 


wave propagation, a prediction of many cosmological modified gravity and dark energy theories. 


3.4.1 Basic idea and equations 


When two compact objects, such as black holes and/or neutron stars, orbit each other, the time-varying mass 
quadrupole sources space-time perturbations, or GWs. At sufficiently tight orbital separations, the energy and 
angular momentum radiated by GWs shrinks the orbit until the two objects merge, forming a bigger black hole or 
neutron star. Such sources of GWs are known as “compact binary coalescences”. A passing GW signal stretches 
and squeezes space-time, creating a relative change in length AL/L, known as the strain h, or GW amplitude. The 
typical strain for a GW signal sourced by a compact binary coalescence is 10~?!. This stretching and squeezing of 
space-time happens at a certain frequency. The frequency of a GW from a compact binary coalescence is twice the 
orbital frequency, and it therefore evolves with time as the orbit shrinks. The frequency evolution is driven by a 
combination of the masses of the two compact objects known as the chirp mass. 


For compact binary coalescences, the GW strain as a function of time h(t) scales inversely with the luminosity 
distance Dz. To first order: 
MĚ F(¢)2/3 
Dz 
where f(t) is the GW frequency, F (angles) is a function of the source’s position on the sky, inclination and polar- 


ization, and ®(t) is the orbital phase. The “intrinsic loudness” of the GW depends on the redshifted chirp mass 
Mz: 


h(t) = F(angles)cos(®(t)) , (44) 


(mım2)3/5 


M,=(1+z cba) 


(45) 


for binary component masses mı and m2, measured in the source-frame; the factor of (1+2) converts between source- 
frame and detector-frame quantities. Interestingly, this same combination of masses governs the GW frequency 
evolution, f(t) and its derivative f(t): 


3/5 
M.= (sar) HO) a (46) 


so that by measuring both the amplitude and frequency evolution of the GW signal, the luminosity distance can be 
derived. Note that the amplitude also depends on source geometry encoded in F'(angles). For example, a face-on 
binary will emit a louder GW signal than an edge-on binary. We also see that while the cosmological redshift z affects 
the measured GW frequency, this effect is degenerate with the binary’s mass; only redshifted masses appear in the 
equations describing the amplitude and frequency of the GW signal. In order to do cosmology with GW sources, we 
must identify external sources of redshift information. Matching GW source distances with their redshifts allows us 
to probe the cosmological parameters with the usual distance-redshift relation: 


De ats) f i TET (47) 


We discuss methods for measuring the redshift of GW sources in the following Sect. 3.4.2. 
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3.4.2 Sample selection 


In order to use GW sources as cosmological indicators and standard sirens, the required ingredients are i) estimating 
the GW distances, and ii) assigning redshifts to the GW sources. 


Gravitational-wave distances. Every GW detection of a compact binary coalescence provides a measurement 
of the source’s luminosity distance. For a given source, the accuracy of the GW luminosity distance measurement is 
typically O(10%), depending on the parameters of the source and its signal-to-noise ratio. For some systems, the dis- 
tance constraints are much tighter because the distance-inclination degeneracy, which stems from the F'(angles) /Dr 
factor in Eq. 44, can be broken. This occurs for binaries with misaligned spins leading to measurable orbital preces- 
sion and binaries with asymmetric mass ratios that emit measurable higher-order GW harmonics (Vitale and Chen, 
2018; Abbott et al., 2020; Borhanian et al., 2020; Calderón Bustillo et al., 2021). Occasionally, electromagnetic ob- 
servations of the same source (for example, observations of beamed emission from binary neutron star mergers) can 
be used to independently measure the source inclination, resulting in a tighter GW distance measurement (Mooley 
et al., 2018; Dobie et al., 2020). However, this introduces layers of astrophysical modeling, and in this case the 
standard siren is not calibrated by general relativity alone. 


Assigning redshifts to gravitational-wave sources. The challenge for standard siren cosmology is to identify 
the redshifts of GW sources. Multi-messenger observations, such as neutron star mergers with electromagnetic 
counterparts like short gamma-ray bursts or kilonovae, provide the most straightforward measurement (Holz and 
Hughes, 2005). An electromagnetic counterpart like a kilonova can typically be pinpointed to a specific galaxy, 
thereby identifying the host galaxy of the GW merger. The GW signal provides the distance to the host galaxy, 
while its electromagnetic spectrum provides the redshift. These sources are typically referred to as bright sirens. 


Without an electromagnetic counterpart, the GW event is usually too poorly localized on the sky to allow for a 
unique host galaxy identification (Abbott et al., 2018). Only the loudest, best-localized GW events (1 per several 
hundred events) are expected to have only a single galaxy in their localization volumes (Chen and Holz, 2016). 
Nevertheless, if a sufficiently complete galaxy catalog is available, one can consider all of the galaxies within the 
GW localization volume as potential host galaxies, and statistically marginalize over them. This was the original 
proposal by Schutz (1986), and the method was further developed in a Bayesian context by Del Pozzo (2012); Chen 
et al. (2018). These sources are often called dark sirens. At the typical distances of GW events (greater than several 
hundred Mpc), spectroscopic galaxy catalogs are rare, although photometric galaxy catalogs (with redshifts inferred 
by photometry rather than spectra) can be useful when they overlap with the GW skymap (Soares-Santos et al., 
2019; Palmese et al., 2020). New and upcoming large-scale spectroscopic galaxy surveys like DESI, Taipain, SDSS-V, 
and 4MOST may provide useful galaxy catalogs for statistical GW standard siren analyses, either by cataloging a 
large fraction of the sky or through targeted follow-up of GW event localizations. 


In the absence of counterparts or galaxy catalogs, alternative sources of redshift information have been proposed. 
If galaxy catalogs are incomplete but GW events are well-localized, matching the spatial clustering of GW sources 
as a function of distance to the clustering of galaxies as a function of redshift can constrain cosmological parame- 
ters (MacLeod and Hogan, 2008; Oguri, 2016; Mukherjee and Wandelt, 2018; Vijaykumar et al., 2020; Bera et al., 
2020; Mukherjee et al., 2021). 


Another extension of the statistical standard siren method is to use prior knowledge of the merger redshift 
distribution, derived from external measurements of the star formation rate and time delay distribution of binary 
mergers, to compare against the observed gravitational-wave distance distribution (Ding et al., 2019; Ye and Fishbach, 
2021; Leandro et al., 2021). Finally, a promising avenue for gravitational-wave only standard siren analyses is to use 
known features in the source population to directly extract the redshift and distance from the gravitational-wave 
signal alone. If information about the source-frame frequency is available, the redshift can be derived from the 
observed GW frequency. This source-frame GW frequency information can come from features in the source-frame 
mass distribution (Chernoff and Finn, 1993; Taylor et al., 2012; Taylor and Gair, 2012; Farr et al., 2019; You et al., 
2021; Ezquiaga and Holz, 2021) as well as tidal effects in neutron star mergers (Messenger and Read, 2012; Del Pozzo 
et al., 2017). 


As Farr et al. (2019) showed, an especially promising feature in the black hole mass distribution is the lower 
edge of the pair-instability mass gap: a steep drop-off in the black hole mass distribution at ~ 40-65 Mo, which 
may be accompanied by a pile-up of black holes immediately below the gap at = 35 Mo. Stellar models (Fowler 
and Hoyle, 1964; Rakavy et al., 1967; Fryer et al., 2001; Heger and Woosley, 2002) show that when the black hole 
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progenitor Helium star is in the mass range ~ 40-120 Mo, after the helium burning stage, unstable electron-positron 
pair production occurs in the carbon-oxygen core. This pair production reduces the photon pressure in the stellar 
core, and causes oxygen to explosively ignite. This explosive oxygen burning generates an energetic outwards pulse, 
which can disrupt the star entirely, leaving behind no stellar remnant, or shed off enough mass so that when the 
star collapses to a black hole, its mass is below the mass gap. Because the physics of pair instability depends 
primarily on the mass of the carbon-oxygen core, the location of the lower and upper edge of the gap are expected 
to be independent of redshift (Farmer et al., 2019). By observing the redshifted mass distribution as a function 
of luminosity distance in gravitational waves, the location of the pair-instability feature(s) can be jointly inferred 
together with the redshift-distance relation (Farr et al., 2019; Mastrogiovanni et al., 2021). Gravitational-wave 
observations of binary black holes support the existence of bump, followed by a steepening of the black hole mass 
distribution at ~ 40 Mọ (Fishbach and Holz, 2017; Abbott et al., 2021b; The LIGO Scientific Collaboration et al., 
2021b). The interpretation of this feature as the imprint of pair-instability supernovae is still uncertain; however, 
as black hole population models improve, such features in the black hole mass distribution can be theoretically 
calibrated and reach their potential as robust cosmological probes. 


3.4.3 Measurements 


While GWs directly provide the luminosity distance to the source, there are multiple ways to estimate its red- 
shift. As discussed in the previous section, standard siren redshift measurements fall under three main categories: 
electromagnetic counterparts, galaxy catalogs, and features in the GW source population. 


3.4.3.1 Electromagnetic counterparts 


The multi-messenger binary neutron star detection, GW170817, provided the first standard siren measurement 
of the Hubble constant (Abbott et al., 2017c,a). Gravitational-wave parameter estimation provided a luminosity 
distance of 43.812: Mpc. The kilonova optical counterpart allowed for the identification of a unique host galaxy 
NGC4993. Because this event was relatively nearby, the measured redshift of NGC4993 is significantly affected by 
its peculiar (non-Hubble flow) velocity. In this case, the peculiar velocity is large (~ 300 km/s) because NGC4993 
is near to the Great Attractor. Correcting for inter-group and bulk flow velocities, the Hubble flow velocity is 
3017 + 166 km/s. At z ~ 0.01, this event is only sensitive to the first-order linear redshift-distance relation, and 
the resulting Hubble constant measurement is Hp= 70t}? km s~!Mpc7! (maximum a-posteriori value and 68.3% 
highest density credible interval, taking a flat-in-log prior on Ho). With improved analysis of the gravitational-wave 
signal and slightly updated distance measurement, the Hubble constant measurement was updated to Ho= WE km 
s71Mpc! (Abbott et al., 2019a). In addition to measuring the Hubble constant, GW170817 and its electromagnetic 
counterpart enabled impressively tight constraints on cosmological modified gravity theories, including the speed of 
gravity and gravitational-wave friction (Abbott et al., 2017; Amendola et al., 2018; Ezquiaga and Zumalacárregui, 
2017; Sakstein and Jain, 2017; Creminelli and Vernizzi, 2017; Baker et al., 2017; Crisostomi and Koyama, 2018; Lagos 
et al., 2019; Pardo et al., 2018; Abbott et al., 2019b). 


3.4.3.2 Galaxy catalogs 


To date, the only gravitational-wave event with a confident electromagnetic counterpart is GW170817. A possible 
AGN flare association was identified for the binary black hole event GW190521 (Graham et al., 2020), but the 
association is debatable (Ashton et al., 2021; De Paolis et al., 2020; Palmese et al., 2021). However, the statistical 
galaxy catalog method has been applied to several gravitational-wave events. As a proof of concept, Fishbach et al. 
(2019) demonstrated the statistical method with GW170817, marginalizing over galaxies in the GLADE catalog 
(Dálya et al., 2018), rather than using the uniquely identified host galaxy NGC4993. Because GW170817 was 
exceptionally loud and close-by, and all three detectors of the LIGO-Virgo network were operational, it was localized 
to only 16 deg? with 90% credibility (215 Mpc? assuming standard cosmological parameters from Planck Collaboration 
et al., 2015). This small localization volume contains only one large group of galaxies (the group containing NGC4993) 
at z ~ 0.01, and so the statistical standard siren measurement of Ho from GW170817 is almost as informative 
as the counterpart measurement. In most cases, the gravitational-wave localization volume contains O(10+-10°) 
potential host galaxies, and so the statistical standard siren method would be substantially less informative even if 
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we had complete galaxy catalogs with well-measured redshifts. The two best statistical standard sirens, excluding 
GW170817, are the binary black hole event GW170814 (Abbott et al., 2017b) and the (probable) binary black hole 
event GW190814 (Abbott et al., 2020). (The secondary mass of GW190814 is ambiguous, and GW190814 may be a 
neutron star—black hole system.) Both of these events lack electromagnetic counterparts, but their sky position and 
gravitational-wave location are ideal for the statistical galaxy catalog method. Not only are they the best-localized 
events from the first three observing runs (other than the binary neutron star event GW170817), but they also both 
fall within the footprint of the Dark Energy Survey (DES, Dark Energy Survey Collaboration et al., 2016). 


GW170814 was the first three-detector gravitational-wave event, observed by Virgo in addition to the two LIGO 
observatories in their second observing run. Using data from all three detectors enabled a 90% sky localization of only 
60 deg? (compared to 1160 deg? using only data from the two LIGO detectors). Correlating the gravitational-wave 
sky map and distance measurement of 5401130 Mpc with the photometric galaxy catalog from DES, Soares-Santos 
et al. (2019) performed the first standard siren measurement of the Hubble constant using a binary black hole. With 
only a single event, the measurement was relatively broad, with the 68% posterior credible interval encompassing 
~ 60% of the prior, but nevertheless there was a clear peak at Ho~ 75 km s~*Mpc™tassociated with an over-density 


of galaxies at z ~ 0.12. 


GW190814, detected by LIGO and Virgo in their third observing run, is the best-localized dark standard siren 
observed to date. It was localized to 18 deg? (90% credibility) on the sky. At alan Mpc, it is nearby and has an 
impressive signal-to-noise ratio of 25. Furthermore, because of its asymmetric masses (mass ratio of approximately 
1:10), the gravitational-wave signal contains detectable higher harmonics, which reduce the distance-inclination de- 
generacy and yields a tighter distance measurement. Combining the gravitational-wave localization with the GLADE 
galaxy catalog, Abbott et al. (2020) performed a statistical standard siren measurement of the Hubble constant, find- 
ing a broad peak at Ho= 75o. km s~!Mpc™! (with the 68% highest posterior density interval comprising 60% of 
the prior range). Although GW190814 is very nearby for a gravitational-wave event, it is at the limit of where 
currently-available spectroscopic galaxy catalogs are useful. At these distances, the GLADE catalog is 40% com- 
plete. Meanwhile, like GW170814, GW190814 lies within the DES footprint. Although the DES catalog contains 
photometric, rather than spectroscopic redshifts, which means larger errors on each galaxy’s redshift, it does not 
suffer from incompleteness. Palmese et al. (2020) used the DES galaxies within the GW190814 sky map to measure 
the Hubble constant to Ho= 78775 km s~'Mpc~’, consistent with the result of Abbott et al. (2020). 


3.4.3.3 Standard siren population 


In order to achieve competitive cosmological constraints, information must be combined across multiple standard 
sirens. Analyzing a population of standard sirens requires a careful treatment of measurement uncertainties and 
selection effects (Mandel et al., 2019; Chen et al., 2018; Mortlock et al., 2019). The importance of incorporating 
selection effects can be understood by considering that gravitational-wave detectors are significantly more likely to 
observe sources at smaller distances, but there are more potential host galaxies at higher redshifts. If the analysis did 
not account for selection effects, it would tend to overestimate the redshifts of gravitational-wave events and therefore 
overestimate the Hubble constant. Meanwhile, because the probability of detecting a gravitational-wave source is 
a strong function of its mass and distance (and, to a lesser degree, the component spins), we must simultaneously 
fit the gravitational-wave source distribution, particularly the astrophysical mass distribution and distance/redshift 
distribution, with the cosmological parameters. For example, if the wrong binary black hole mass distribution is 
assumed in the statistical galaxy catalog method, the recovered cosmological parameters will be biased (Abbott 
et al., 2021a; Mastrogiovanni et al., 2021; The LIGO Scientific Collaboration et al., 2021c). The assumed black 
hole and neutron star spin distribution can also affect the cosmological inference, both because the binary spin 
impacts the gravitational-wave detection probability and because of mild degeneracies between the measured binary 
spin, inclination and luminosity distance. The latter effect was already noted for the GW170817 standard siren 
measurement; assuming different priors on the neutron star spin magnitudes yielded slightly different posteriors on 
the Hubble constant (Abbott et al., 2019a). In addition to the gravitational-wave data, care must be taken in the 
statistical treatment of the redshift information. If redshifts are supplied from a galaxy catalog, particular attention 
is required in treating galaxy catalog incompleteness (Fishbach et al., 2019; Gray et al., 2020; Finke et al., 2021; 
Gray et al., 2021). 


The latest gravitational-wave catalog consists of ~90 events from three observing runs of LIGO and Virgo (The 
LIGO Scientific Collaboration et al., 2021a). Using redshift information either from galaxy catalogs or from the 
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Figure 17: Constraints on Ho and H(z) from GWs used as standard sirens. Left panel: Ho posterior obtained 
from the combination of the signal of 42 black hole-black hole mergers from GWTC-3 with the detection of 
GW170817 The LIGO Scientific Collaboration et al. (2021c). Right panel: Forecasts on H(z) measurements 
obtained from the simulation of five years (orange line, one year with the blue line) of detection from the Advanced 
LIGO and Virgo detectors (Farr et al., 2019). Images reproduced with permission from The LIGO Scientific 
Collaboration et al. (2021c) and Farr et al. (2019), copyright by Astrophysical Journal. 


redshifted binary black hole mass spectrum, these events have been used in combination with the counterpart 
standard siren measurement of GW170817 to constrain the expansion history H(z) and several cosmological modified 
gravity theories (Finke et al., 2021; The LIGO Scientific Collaboration et al., 2021c; Palmese et al., 2021; Mancarella 
et al., 2021). With the relatively low-redshift sample, the best measured cosmological parameter remains the Hubble 
constant, and the constraints using all events represent a ~ 20% improvement over the measurement from GW170817 
and its counterpart (see Fig. 17, left panel). 


3.4.4 Systematic effects 


The limiting systematic uncertainty for standard siren measurements is the detector calibration, specifically the am- 
plitude uncertainty. Each detector’s amplitude response uncertainty translates to a systematic distance uncertainty 
for the GW source, contributing at the few-percent level. For individual events, the statistical distance uncertainty 
of O(10%) dominates calibration uncertainty. But when stacking events to infer cosmological parameters, unlike 
the statistical distance uncertainty, the calibration uncertainty may not average out. An important prerequisite for 
reaching a percent-level Ho measurement with standard sirens is to reduce the amplitude calibration uncertainty 
below 1% (Sun et al., 2021a). 


As the standard siren catalog continues to grow, other uncertainties in the gravitational-wave distance mea- 
surements will become important. One of these uncertainties is the gravitational waveform model. Extracting the 
distance of the source from the gravitational-wave measurement requires a gravitational waveform model that is not 
perfectly known, especially for systems with strong matter effects, extreme spins or mass ratios (Huang et al., 2021). 
For standard sirens at larger distances, the gravitational-wave signal may be (de)magnified due to weak gravitational 
lensing by matter along the line of sight. Most of these uncertainties may be incorporated into the statistical frame- 
work and contribute to a statistical rather than systematic uncertainty. For example, if the distribution of lensing 
magnifications is known, this contribution can be marginalized over in the GW distance likelihood (Holz and Hughes, 
2005; Hirata et al., 2010; Sathyaprakash et al., 2010). As discussed in Sect. 3.4.3, the astrophysical distributions of 
the masses, spins and distances of black hole and neutron star mergers must be simultaneously inferred with the 
cosmological parameters, especially when analyzing a population of standard sirens at cosmological distances. Even 
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compared to the current large statistical uncertainties, fixing the binary black hole mass distribution in the galaxy 
catalog standard siren analysis results in a significant systematic uncertainty, whereas the joint inference trans- 
fers the systematic uncertainty to a statistical uncertainty that converges with many events (The LIGO Scientific 
Collaboration et al., 2021c). 


There are also uncertainties in the redshift measurements that, if not properly understood, can contribute to a 
systematic uncertainty. The counterpart standard siren method, where the redshift information comes directly from 
a unique host galaxy identification, is the least susceptible to systematic effects. A possible systematic uncertainty 
in the redshift measurement can come from errors in the peculiar velocity correction, but the statistics of peculiar 
velocities are well-understood and, especially at typical standard siren distances, contribute a negligible fraction 
of the uncertainty budget. On the other hand, when galaxy catalogs are used for the redshift information, they 
introduce more potential sources of systematic uncertainty. Factors such as catalog incompleteness, photometric 
redshift uncertainties, and the galaxies’ probabilities of hosting gravitational-wave sources must be understood. If 
the redshift information is supplied by features in the source distribution, it is important to check that the population 
model is not mis-specified. For example, fitting the binary black hole mass distribution to a power law, where the 
true distribution more closely resembles a mixture model between a power law and a Gaussian, would lead to biased 
recovery of the mass distribution and the cosmological parameters. In general, the source distribution also needs to 
be calibrated against theoretical models. If the source mass distribution evolves with redshift (Fishbach et al., 2021), 
theoretical guidance is required to disentangle the source mass evolution with cosmological redshift. 


3.4.5 Main results and forecasts 


The current best standard siren constraints are dominated by the Hubble constant measurement from GW170817 and 
its electromagnetic counterpart, which yielded Ho= 70+} km s~!Mpc~!. However, with ~ 90 gravitational-wave 
events detected to date, the population of standard sirens without counterparts is beginning to contribute. Out of 
the gravitational-wave events without counterparts, a couple of events, namely GW170814 and GW190814, have 
been particularly well-localized so that comparing their localization posteriors to a galaxy catalog yields only ~ 1 
probable galaxy structure that contains the host galaxy, resulting in a uni-modal, fairly informative Hubble constant 
measurement. The remaining dozens of events have also been used for standard siren analyses in conjunction 
with galaxy catalogs, but care is required in the interpretation of these results. Unless the source population, 
particularly the binary black hole mass distribution, is simultaneously inferred with the cosmological parameters, 
hidden assumptions about the source population can impact the cosmological inference and result in overly optimistic 
constraints. So far, the only analyses that simultaneously fit the source population and cosmological parameters do so 
without incorporating galaxy catalog information (The LIGO Scientific Collaboration et al., 2021c). With the latest 
gravitational-wave catalog GWTC-3, these methods yield a 17% improvement in the Hubble constant measurement 
over the measurement from GW170817 and its counterpart (see the left panel of Fig. 17; The LIGO Scientific 
Collaboration et al., 2021c). 


The most robust standard sirens are gravitational-wave sources with electromagnetic counterparts, typically 
binary neutron stars, although some neutron star-black hole mergers may also produce electromagnetic emission. 
With the current ground-based gravitational-wave detectors, these sources will predominantly be sensitive to the 
Hubble constant, and with N sources with counterparts, we expect the Hubble constant measurement to converge 
as 15%/V N (Chen et al., 2018). 


For the majority of gravitational-wave events that lack counterparts, galaxy catalogs can be used for the redshift 
information. Further work needs to be done to develop galaxy catalogs specifically for the standard siren application, 
manage catalog incompleteness, and jointly fit the source population together with the cosmological parameters to 
avoid systematic bias. Another promising method is to use features in the source mass distribution to fit cosmological 
parameters together with the source population in a gravitational-wave only analysis. Farr et al. (2019) showed that 
leveraging the pair-instability feature in the black hole mass distribution can provide percent-level constraints on 
H(z) at z = 0.8 within 5 years of Advanced LIGO observations (see the right panel of Fig. 17). By combining 
binary black holes, which can be observed at higher redshifts, with nearby binary neutron stars with counterparts, 
the expansion history can therefore be measured out to z ~ 1.5. For this method to provide robust cosmological 
constraints, further progress is required in theoretical models of the black hole mass distribution. In particular, the 
redshift evolution of the source mass distribution must be theoretically understood. 


47 


Standard sirens are unique cosmological probes in that they simultaneously probe the background cosmology 
and gravitational perturbations, namely the propagation of gravitational waves. Beyond constraining the Hubble 
constant, standard sirens are therefore especially promising for constraining dark energy theories both through their 
effects on the background cosmology and their effects on gravitational-wave propagation. 


The era of gravitational-wave cosmology has just begun. The gravitational-wave catalog is growing at an incredible 
rate, and by the late 2020s, the gravitational-wave detector network of LIGO, Virgo and Kagra are expected to detect 
hundreds to thousands of events annually. In the coming decades, the space-based gravitational-wave detector LISA 
is expected to launch, and the next generation of ground-based gravitational-wave detectors Cosmic Explorer (Reitze 
et al., 2019) and the Einstein Telescope (Punturo et al., 2010) may become a reality. The growth of the gravitational- 
wave dataset is accompanied by new electromagnetic telescopes to hunt counterparts, galaxy surveys to expand 
redshift catalogs, theoretical developments to model the gravitational-wave source population, and computational 
techniques to carry out the standard siren inference. Standard siren cosmology is a rapidly growing field with a 
promising future. 
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3.5 Time Delay Cosmography 


Time-delay cosmography uses measurements of relative arrival times of multiply gravitationally lensed sources to 
measure an absolute scale of the Universe. The method was originally proposed by Refsdal (1964) over half a 
century ago, prior to the discovery of the first extra-galactic gravitational lens. The methodology provides a one-step 
measurement of the Hubble constant, completely independent of the local distance ladder or probes anchored with 
sound horizon physics, such as the cosmic microwave background (CMB). Figure 18 illustrates different galaxy-scale 
gravitational lenses with a multiply imaged quasar in different configurations (Suyu et al., 2017). 


HE 0435—1223 HE 1104—1805 


WFI2033—4723 


B1608+656 RXJ1131—1231 


Figure 18: Four quadruply lensed quasar systems and one doubly lensed quasar system from the HOLiCOW 
sample. The lens name is indicated above each panel. The color images are composed using 2 (for B1608+656) 
or 3 (for other lenses) HST imaging bands in the optical and near-infrared. North is up and east is left. Image 
reproduced with permission from Suyu et al. (2017), copyright by Monthly Notices of the Royal Astronomical 
Society. 


3.5.1 Basic idea and equations 


The phenomena of gravitational lensing can be described by the lens equation, which maps the source plane coordinate 
B to the image plane 8: 
B=0-a(8) , (48) 


where æ is the angular shift on the sky between the original un-lensed and the lensed observed position of an object. 


For a single deflector plane, the lens equation can be expressed in terms of the physical deflection angle â as: 


Das 
—-@- 
p D 


â(0) , (49) 


S 


where D, and Das are the angular diameter distance from the observer to the source and from the deflector to the 
source, respectively. In the single lens plane regime, we can introduce the lensing potential y such that the reduced 
deflection angle is the gradient of the potential: 


a(0) = Vy(0) , (50) 
and the lensing convergence as: 
1 
K(0) = zV v(e) . (51) 


Physically, the lensing convergence in this regime corresponds to the projected surface mass density © normalized to 
the critical lensing surface density “crit: 


u(0) 
6) = 2 
ro) = 5 (52) 
with the critical lensing surface density: 
CDs 
Darit => a pP °? 53 
t 4nGDaDas (pa 


49 


where Da is the angular diameter distance to the deflector, c is the speed of light and G is the gravitational constant®. 


The relative arrival time between two images 04 and 0g, Atas, originated from the same source is given by: 


Atan = —** [r(@a,8)-7(6n,8)] (54) 


where: 


(0,8) = oe = vo) (55) 


is the Fermat potential (Schneider, 1985; Blandford and Narayan, 1986), and: 


DaD; 


Da = (1 H za) Di 


(56) 


is the time-delay distance (Refsdal, 1964; Schneider et al., 1992; Suyu et al., 2010). 


Constraints on the Fermat potential difference Avap and a measured time delay Atag allows one to constrain 
the time-delay distance Dat. This absolute physical distance anchors the scale in the Universe within the redshifts 
involved in the lensing configuration. The Hubble constant is inversely proportional to the absolute scales of the 
Universe, and thus scales with Da; as: 

Ho x D} 4 (57) 


mildly dependent on the relative expansion history from current time (z = 0) to the redshift of the deflector and 
source. 


While the time delay Atag can be directly measured (see Sect. 3.5.3), the relative Fermat potential Arp is 
not a direct observable. The primary information to infer AT,g are positional constraints and extended distortions 
from the lensing effect. However, there are degeneracies inherent in gravitational lensing that limit the amount of 
information accessible by lensing distortions (e.g., Falco et al., 1985; Gorenstein et al., 1988; Kochanek, 2002; Saha 
and Williams, 2006; Schneider and Sluse, 2013, 2014; Birrer et al., 2016; Unruh et al., 2017; Birrer, 2021). 


The most prominent lensing degeneracy impacting the time-delay prediction is the mass-sheet degeneracy (MSD, 
Falco et al., 1985). The MSD is a multiplicative transform of the lens equation (Eq. 48) which preserves image 
positions (and any higher order relative differentials of the lens equation) under a linear source displacement 3 —> AG 
combined with a transformation of the convergence field: 


Ky(0) = An(O) + (1—A) . (58) 


The term (1— A) in Eq. 58 above describes an infinite sheet of convergence (or mass), and hence the name mass-sheet 
transform (MST). Only observables related to the unlensed apparent source size, to the unlensed apparent brightness, 
or to the lensing potential are able to break this degeneracy. Thus, the same relative lensing observables can result 
if the mass profile is scaled by the factor A with the addition of a sheet of convergence (or mass) of K(@) = (1 — A). 


The Fermat potential (Eq. 55) scales with A as: 
Arap,a = AATAB , (59) 


and so does the time delay as: 
AtAB,A = \Atap . (60) 


When transforming a lens model with a mass-sheet transformation, the inference of the time-delay distance (Eq. 56) 
from a measured time delay and inferred Fermat potential transforms as: 


Daa SX Da . (61) 
Thus, the Hubble constant, when inferred from the time-delay distance Daz, transforms from Eqn. 57 as: 


Ho, = ÀH . (62) 


9The critical lensing surface density is only considering mass relative to the mean background cosmological density. 
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An MSD effect relative to a proposed deflector model might occur either within the mass distribution of the main 
deflector, referred as internal MSD with Aint, or being caused due to homogeneities along the line-of-sight (LOS) of 
the strong lens system. 


Mass over- or under-densities along the LOS of the strong lensing system cause, to first order, shear and conver- 
gence perturbations. Reduced shear distortions have a measurable imprint on the azimuthal structure of the strong 
lensing system (see e.g., Birrer, 2021) while the convergence component of the LOS, denoted as kext, is equivalent to 
an MST, and thus not directly measurable from imaging data. The lensing kernel impacting the linear distortions, 
both shear and kext, is different from the standard weak lensing kernel (McCully et al., 2014, 2017; Birrer et al., 
2017, 2020; Fleury et al., 2021b). 


We define D's as the specific angular diameter distance along the line-of-sight of the lens being corrected by 
LOS structure and D>** as the angular diameter distance from the homogeneous background metric without any 
perturbative contributions. D!®®S and D> are related through the convergence terms as (Birrer et al., 2020): 


DE = (aK oy (63) 
DS = (1 — ks) DE (64) 
Diss = (1— fas) Das > (65) 


where «qa is the weak lensing effect from the observer to the deflector, ks from the observer to the source, and Kgs 
from the deflector to the source, respectively (Birrer et al., 2020). The lensing kernel impacting the time delay can 
be described as the product of three different angular diameter distances entering Da; in Equation 56 (Birrer et al., 
2020; Fleury et al., 2021a), 

(1 — Ka)(1 — ks) 


ae (66) 


1 Kext = 

MSD uncertainties or biases may also arise relative to assumptions made in the radial density profile of the main 

deflector galaxy (see, e.g., Kochanek, 2002; Read et al., 2007; Schneider and Sluse, 2013; Coles et al., 2014; Xu et al., 

2016; Birrer et al., 2016; Unruh et al., 2017; Sonnenfeld, 2018; Kochanek, 2020; Blum et al., 2020; Birrer et al., 2020; 

Kochanek, 2021). Any lensing-only constraints on the radial density profile is over-constrained, and constraints rely 
on the functional form imposed. 


The total MST, i.e. the relevant transform to constrain for an accurate cosmography and Ho measurement, is 
the product of the internal and external MST (e.g., Schneider and Sluse, 2013; Birrer et al., 2016, 2020): 


A= (1 = Kext) x Aint . (67) 


The external line-of-sight lensing contribution can be estimated by tracers of the large-scale structure, either 
using galaxy number counts (e.g., Greene et al., 2013; Rusu et al., 2017), or weak-lensing measurements (Tihhonova 
et al., 2018). These measurements, paired with a cosmological model including a galaxy-halo connection are able to 
constrain the probability distribution of Kext to few per cent per sight line. 


Among those observations that are sensitive to the total MST A, stellar kinematics is the most prominent and 
commonly used one. The dynamics of stars is a direct tracer of the three-dimensional gravitational potential and 
provides an independent mass estimate. Joint lensing and dynamics constraints have been used to provide mea- 
surements of galaxy mass profiles (e.g., Grogin and Narayan, 1996; Romanowsky and Kochanek, 1999; Treu and 
Koopmans, 2002). The modeling of the kinematic observables in lensing galaxies range in complexity from spherical 
Jeans modeling (Binney and Tremaine, 2008) to Schwarzschild (Schwarzschild, 1979) methods. 


Regardless of the approach, the prediction of any o, from any model can be decomposed into a cosmological- 
dependent and cosmology-independent part as (see e.g., Birrer et al., 2016, 2019): 


Ds 
o; = AB 6 I Eiens: Bani) , (68) 

ds 
where c is the speed of light, J is a dimensionless quantity dependent on the deflector model (€jens), the stellar 


anisotropy distribution (Bani) and the observational conditions and luminosity-weighting within the aperture (e.g., 
Binney and Mamon, 1982; Treu and Koopmans, 2004; Suyu et al., 2010). 
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The constraints obtained from joint lensing and dynamics are either able to determine the MST component of 
the deflector model, or provide additional cosmographic constraints on the relative expansion history through the 
involved angular diameter distance ratio (D,/Das, Eq. 68). When adding a time delay, the joint cosmographic 
constraints from a combined analysis of time-delay, lensing, and dynamics can be translated into a two-dimensional 
angular diameter distance plane (Birrer et al., 2016, 2019). When mapped into the D,;-Dg-plane, the projection in 
Dg is invariant under any pure MSD parameter À (Paraficz and Hjorth, 2009; Jee et al., 2015; Birrer et al., 2019)1°. 


An alternative approach to constrain the MSD is with absolute lensing magnifications. The MSD transforms the 
lensing magnification ps by: 
by =A 7p . (69) 


Thus, a known apparent unlensed brightness of an object Fin) with a measured flux Fops can directly measure the 
target magnification: 
Fabs 


Fun 


Gravitationally lensed supernovae (glSNe) can provide, in addition to measurable time delays, lensing magnification 
constraints when knowledge about the unlensed apparent brightness of the explosion is imposed. This measurement 
does not require an absolute bolometric calibration of the exploding transient, but only relative to an unlensed field 
(e.g., Kolatt and Bartelmann, 1998; Oguri and Kawano, 2003; Foxley-Marrable et al., 2018; Birrer et al., 2021). 


i= (70) 


3.5.2 Sample selection 


The primary requirement to provide an absolute distance measurement is a measured relative time delay between a 
multiply imaged source. A time delay can only be measured if the source is bright and time-variable, or a transient. 
The original proposed source by Refsdal (1964) were lensed supernovae before the discovery of the strong-lensing 
phenomena on cosmological scales. The first extra-galactic lens discovered was a doubly lensed quasar (Walsh et al., 
1979). Lensed quasars were quickly identified as excellent sources for time-delay cosmography as they are variable 
on short time scale, making the time-delay measurements possible, and they are sufficiently bright to be observed at 
cosmological distances. Lensed quasars are typically found at redshift zs ~1-3, lensed by massive early-type galaxies 
located around redshift zqg ~0.2-0.8. This configuration typically produces multiple images separated by 1-3”. 


Strongly lensed quasars are rare objects on the sky. The discovery of currently known lensed quasars followed 
different paths. Some lenses were serendipitously discovered by visual inspection of astronomical images, in particular 
in the early days (e.g., Sluse et al., 2003). More recently, with the advent of large ground and space-based imaging 
surveys, more systematic searches could be conducted, involving astrometric and color selections on post-processed 
catalogs (Krone-Martins et al., 2018; Agnello et al., 2018; Lemon et al., 2019), and more recently directly employed 
machine learning techniques on both catalogs and images. The discovery process is made in phases of certainty of 
the lensing nature with increased follow-up efforts. The first step with wide-field surveys often results in hundreds 
of candidates, of which a subset of the highest ranked candidates is followed-up with spectroscopic observations to 
confirm the identical redshift of the pair or quartet of quasar images, and with deep high-resolution imaging to detect 
the deflector galaxy and extended lensed features from the quasar host galaxy. 


The most prominent lensing system being utilized are galaxy-scale lenses with quadruply imaged quasars. These 
systems can offer several relative time delays, additional constraints on the lens model from both positional constraints 
of the quasars and the often Einstein-ring-like lensed structure of the quasar host galaxy. Thus, a significant effort 
in the search and follow-up work has been spent to find quadruply lensed quasars. Quadruply lensed quasars are 
less frequent than doubly lensed quasars by a factor of about ~5 (Oguri and Marshall, 2010). The more abundant 
population of doubly lensed quasars provide less constraints per individual lens, but come with a potential in a 
population-level analysis. 


More recently, the first multiply imaged supernovae were discovered in a galaxy cluster environment (Kelly et al., 
2015) and on a galaxy-scale lens (Goobar et al., 2017). This opens the path, as envisioned by Refsdal (1964), to use 
lensed supernovae as the time-variable source to measure Ho and with it the opportunity to utilize an entirely new 
source population. 


10 D4 is still dependent on the LOS between observer and lens, kg (Eqn. 65). 
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3.5.3 Measurements 


In order to measure the distances Daz, or more generally the Da;-Dg combination, from a time-delay lens system 
for cosmography, we need the following data products: 


1. discovery of a lens with a time-variable source; 
2. spectroscopic redshifts of the lens zg and source zs; 
time delays between the multiple images; 


lens mass model to determine the Fermat potential; 


ore te ee 


lens environment studies to constrain external lensing effects related to the mass-sheet degeneracy. 


The dataset required for each step are observationally cheap in comparison to other cosmological probes. However, 
the combined analysis, even of a single lens, requires the coordination of multiple independent observations. The 
analysis can be impossible or severely limited in its precision and reliability by a single missing ingredient. For the 
discovery datasets, we refer to Sect. 3.5.2 and references therein. 


Spectroscopic redshifts. The spectroscopic redshifts of the quasar sources zs are often easy to obtain given the 
frequent emission lines in quasars. The redshift of the lens zą can be challenging since the bright quasar images 
can outshine the lens galaxy. Getting zq of lensed quasar systems often require spectra taken under good seeing 
condition, to deblend the lensing galaxy from the quasar. 


Time delays. Without measurements of a time delay, no constraints on absolute distances involved can be inferred, 
and thus, regardless of the approach chosen, no direct constraints on the Hubble constant can be achieved. Relative 
time delays are measured with monitoring campaigns to extract light curves from individual images. Lensed quasars 
with images separated by 1-3” are sufficient to be resolved with small ground-based telescope. The monitoring of 
lensed quasars is thus challenging but possible with 1-m or 2-m class telescope. To perform the measurement, 
several conditions need to be met: i) photometric accuracy with few milli-magnitudes are required to catch the 
low-amplitude variability signal, ii) a good sampling of the light curves is necessary if one targets the fast variations 
of small amplitude, and iii) the duration of the monitoring campaign also need to be sufficient to cover the duration 
of time delays and to ensure that enough variations of the quasar are recorded. Furthermore, seasonal gaps are 
unavoidable in optical light curves since most lensed quasars are not visible all year long. In addition, extrinsic 
variations caused mainly by the micro-lensing of the quasar images, but also a variety of other astrophysical effects, 
are often observed in the light curves. These extrinsic variations and gaps can severely bias time-delay measurements 
if not appropriately modeled for. Once well-sampled light-curves have been acquired, the next step consists in 
identifying features that can be matched in all light curves, and measure the time delays. We refer to (Vuissoz et al., 
2007, 2008; Courbin et al., 2011; Tewes et al., 2013b; Eulaers et al., 2013; Rathna Kumar et al., 2013; Courbin et al., 
2018a; Millon et al., 2020a) for recent measurements and methodology taking into account various aspects of model 
and data uncertainties. 


Lens mass model. The Fermat Potential (Eq. 55) is a crucial component we need to know precisely to be able to 
use time-delay measurements to probe cosmic distances (Eq. 54). High-resolution imaging of gravitational lenses is a 
crucial observation to achieve a precise determination of the relative Fermat potential between multiple images of a 
time-variable source. Imaging modeling is primarily performed on high-resolution space-based Hubble Space Telescope 
(HST; Suyu et al., 2010; Birrer et al., 2016; Wong et al., 2017; Rusu et al., 2020), or ground-based adaptive-optics 
(AO; Chen et al., 2016a, 2019, 2021) imaging. To derive constraints on the lensing deflector from imaging data, all 
components that affect the imaging data need to be modeled and accounted for simultaneously with the lens model. 
This includes, but is not limited to, the extended source component of the AGN or transient host that is lensed, the 
image positions of the time variable source and its resulting point-like flux emission, the surface brightness of the 
deflector galaxy, differential dust extinction, and any other sources of surface brightness. In addition, instrument 
effects, such as the point spread function (PSF), noise (both shot-noise and instrumental noise), pixelization, and 
potential data reduction artifacts need to be accurately taken into account. Different techniques have been developed 
to jointly marginalize over a complex and unknown source morphology. These consist of regularized pixelated source 
reconstruction(e.g., Suyu et al., 2006, 2009), a set of basis functions such as shapelets (e.g., Birrer et al., 2015; 
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Birrer and Amara, 2018), or parameterized surface brightness profiles, such as Sersic profiles. The surface brightness 
amplitude components of all these methods have in common that they create a linear response on the pixels. The 
maximum likelihood of the data given a proposed model for the amplitude components is thus a linear problem, and 
the Gaussian covariance matrix of the linear coefficients can be used to analytically marginalize over the prior (e.g., 
Suyu et al., 2006; Birrer et al., 2015). 


In the absence of knowledge of an absolute source size or brightness, imaging data constraints can not break the 
MST (as discussed in Sect. 3.5.1) and its generalization, the Source-Position-Transform (SPT; Schneider and Sluse, 
2014). The quantity that is constrained by imaging data along the radial direction is (Kochanek, 2002; Sonnenfeld, 
2018; Kochanek, 2020; Birrer, 2021): 

Ona Opa 


rad = (71) 


1- ah 1— KE 
where ap is the derivative and ay is the double derivative of the deflection angle at the Einstein radius 0g, respectively, 
and Kg is the convergence at 0m. We refer to Birrer (2021) for a discussion on azimuthal constraints. 


The currently used data to break the MST is a measurement of the lens velocity dispersion (see Eq. 68). 
The measurement is performed with high-spectral resolution spectrographs on large ground-based adaptive-optics 
supported instruments targeting stellar absorption lines in the rest-frame of the lensing galaxy, such as Keck-DEIMOS, 
Keck-KCWI, or VLT-MUSE. The velocity dispersion measurement is then a joint fit of the spectra taking into account 
the observation conditions, including the atmospheric absorption, the stellar templates matching the lensing galaxy 
type in age distribution and metallicity, and the dispersion width in the stellar distribution on top of the line- 
spread function. For measurements of velocity dispersion used in current time-delay cosmography studies we refer 
to Koopmans et al. (2003); Suyu et al. (2010, 2013); Courbin et al. (2011); Wong et al. (2017); Agnello et al. (2016); 
Sluse et al. (2019); Buckley-Geer et al. (2020). 


Line-of-sight and lens environment. The contribution of large-scale density perturbations and individual massive 
objects along the line-of-sight alter the lensing deflections. To first order, these effects can be captured as cosmic 
shear and convergence. The reduced cosmic shear term is a commonly used model component. The convergence 
component, however, is equivalent to an external mass-sheet kext, (Eq. 66), and can not be measured from imaging 
data. Higher-order effects from nearby groups or individual groups need to be explicitly modeled. Explicit modeling 
of individual groups has been done by, e.g., Fassnacht and Lubin (2002); Momcheva et al. (2006); Wilson et al. 
(2016); Sluse et al. (2017a). For theoretical aspects of the approximation made and in which regime they hold we 
refer to McCully et al. (2014, 2017); Birrer et al. (2017); Fleury et al. (2021b). 


Typically, methods taking advantage of the knowledge of the galaxy-halo connection are employed, and using 
luminous tracers of the underlying dark matter distribution. The most commonly used approach adopts galaxy 
number counts in different weighting schemes (e.g., Suyu et al., 2010; Greene et al., 2013; Rusu et al., 2017). The 
comparison of these weights (summary statistics) with control fields and numerical simulations with an imposed 
galaxy-halo connection allows the computation of the posterior density in kext- Weak lensing mass mapping is an 
alternative and complementary approach (Tihhonova et al., 2018, 2020). The required data for galaxy number counts 
are deep multi-band photometry within several square arc minutes of the deflector, and spectroscopy of the nearby 
galaxies and group identification (e.g., Rusu et al., 2017; Buckley-Geer et al., 2020). For weak lensing, preferentially 
deep space-based images are used to reduce the shape noise and enhance the signal. 


3.5.4 Systematic effects 


The currently two main uncertainties that, if not properly taken into account, can lead to systematic uncertainties 
are the mass profile assumptions of the main deflectors and the selection effects of the lens sample used for the 
analysis. 


Mass profile assumptions. The dominant uncertainty in the current measurement of the Hubble constant with 
strong gravitational lensing time delays is attributed to uncertainties in the mass profiles of the main deflector 
galaxies. The currently employed models mitigating the MST effect is parameterized with a pure MST parameter A 
(Birrer et al., 2020). This parameterization is purely of mathematical nature, and leaves the physical interpretation 
(e.g., Blum et al., 2020) ambiguous, or, in certain regimes even un-physical, with e.g. mass profiles with negative 
density in the outskirts. Such a one-parameter extension to the previously considered more simple and rigid mass 
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profiles may also not encompass the necessary flexibility beyond the pure MST that can affect kinematics observations 
(e.g., Birrer et al., 2020; Yildirim et al., 2021). To make progress, the full degeneracy of the MST needs to be folded 
into flexible, but physically motivated, mass profile parameters, an approach explored by (Shajib et al., 2021), but 
not yet employed for time-delay cosmography. The kinematics observations add additional potential systematics 
in the inference of Hp when employed to break the MST. The primary limitation of the kinematics is the mass- 
anisotropy degeneracy (Binney and Mamon, 1982), as well as projection effects in the light and mass profile and 
de-projection assumptions employed, and rotation and ellipticity moments in the data. These assumptions have to 
be validated sufficiently to quarantee an unbiased interpretation of the mass density profiles and hence Ho form 
time-delay cosmography. 


Selection effects. Strong lenses are inherently tracing a narrow and rare distribution of matter in the Universe. 
Quantifying the selection effects, including the differential selection effects among different samples of lenses, is going 
to be crucial to maintain accuracy in the years to come. Selection effects can impact the line-of-sight distribution, 
the main deflector mass density and ellipticity, the galaxy properties of the deflector as well as of the source, and 
projection effects. Many of these effects can not precisely quantified on a lens-by-lens basis. 

There are two approaches to mitigate selection effects. First, one can try to understanding selection from first 
principles, and explicitly account for the theoretical selection function in the analysis procedure. This approach 
requires extensive simulations and a reproducible selection function, including the discovery channel and follow-up 
decision. Second, one can empirically measure selection functions from a set of observables at hand with assumptions 
of self-similarity among galaxies and line-of-sights with identical properties, such as stellar mass, morphology, redshift 
and environment, and explore empirical scaling relation among them. With the anticipated large number of lenses 
in the near future, and the more uniform dataset of large and deep surveys, both approach will become feasible and 
we advocate analyses that take into account the specific discovery channel in the analysis. 


The two limiting systematics, the mass profile assumptions and selection effects, result to uncertainties on the 
combined Ho measurement of few per cent. Pinning down these systematics to sub-percent levels with new observa- 
tions and methodology is a major current undertaking of the field. 


3.5.5 Main results 


The HOLiCOW collaboration (Suyu et al., 2017) inferred from the independent analysis of six lensed quasar systems 
(Suyu et al., 2010, 2013; Wong et al., 2017; Bonvin et al., 2017; Birrer et al., 2019; Chen et al., 2019; Rusu et al., 
2020) a Hubble constant value of Hp= 357) km s~!Mpc™, describing deflector mass density profiles by either 
a power-law or stars (constant mass-to-light ratio) plus standard dark matter halos (Wong et al., 2020). This is a 
2% precision on Hp, in excellent agreement with the local distance ladder measurement by the SHOES team (Riess 
et al., 2019, 2021a) and more than 3ø statistical tension with early-Universe probes (e.g., Planck Collaboration et al., 
2020c; Aiola et al., 2020). The STRIDES collaboration presented an additional lens with the most precise single-lens 
measurement of Hp= ge ad km s~!Mpc7 ‘with the same mass profile assumptions as the HOLiCOW collaboration 
(Shajib et al., 2020). Millon et al. (2020b) found, combining six lenses from HOLiCOW, SHARP and STRIDES, 
that the result when assuming that all lenses are either one or the other of the two previously assumed forms of the 
mass density profile are valid. In sum, if the mass density profiles are well described by a power-law or a constant 
mass-to-light ratio plus a Navarro-Frank-White (NFW, Navarro et al., 1997) dark matter halo!!, and covariant 
assumptions and priors are negligible, the tension from the strong lensing measurements alone with early-Universe 
results is significant, corroborating other measurements, and new physics may be required. 


The attention thus turned to relaxing the radial profile assumption (see Sect. 3.5.4) and the covariant treatment 
of population priors that can not be constrained on a lens-by-lens basis. Birrer et al. (2020) addressed the issue in the 
most direct way, by choosing a parameterization of the radial mass density profile that is maximally degenerate with 
Ho, via the MST. With this more flexible parameterization, Ho is only constrained by the measured time delays and 
stellar kinematics, increasing the uncertainty on Ho from 2% to 8% for the TDCOSMO sample of 7 lenses resulting 
in Ho= 74.5789 km s~!Mpc~’, without changing the mean inferred value significantly. 


Birrer et al. (2020) introduce a hierarchical framework in which external datasets can be combined with the time- 
delay lenses to improve the precision. They achieved a 5% precision measurement on Hg by combining the TDCOSMO 


11]mposing standard priors on the mass and concentration of the halo. 
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lenses with stellar kinematic measurements of a sample of lenses from the Sloan Lens ACS (SLACS) survey with no 
time-delay information (Bolton et al., 2008; Auger et al., 2009; Shajib et al., 2021), and measure Hp= 67At km 
s~!Mpc~'. The mean of the TDCOSMO+SLACS measurement is offset with respect to the TDCOSMO-only value, 
in the direction of the CMB value, although still statistically consistent given the uncertainties. The Birrer et al. 
(2020) measurements are in statistical agreement with each other and with the earlier HOLICOW/SHARP/STRIDES 
measurements based on radial mass profile assumptions. Birrer et al. (2020) is also consistent, by construction, with 
the study by Shajib et al. (2021), since they share the same measurements for SLACS. Shajib et al. (2021) concluded 
that using a mass profile combining an NFW profile for the dark matter component and stars 1? is a sufficiently 
accurate description of the mass density profile of the SLACS lenses. However, small departures from those forms 
are allowed by the data, resulting in the uncertainties quoted by Birrer et al. (2020). The shift in the mean could 
be real or it could be due to an intrinsic difference between the deflectors in the TDCOSMO and SLACS samples, 
arising from selection effects. For example, the two samples could be well matched in stellar velocity dispersion, but 
they differ in redshift, or the TDCOSMO sample could be source selected and composed mostly of quadruply imaged 
quasars, while SLACS is deflector selected and dominated by doubly imaged galaxies. 


3.5.6 Outlook in the near future 


On the full sky, we expect to exist several 10,000 galaxy-galaxy lenses and several hundred quadruply lensed quasars 
(e.g., Oguri and Marshall, 2010; Collett, 2015). With the upcoming wide and deep ground- and space-based surveys, 
we expect many of those to be discovered within a decade by the Vera Rubin Observatory (LSST Science Collaboration 
et al., 2009), Nancy Grace Roman Space Telescope (Spergel et al., 2015), and Euclid (Laureijs et al., 2011). This 
is an e-folding of the number of lenses possibly suitable for time-delay analyses compared to the current analyses 
conducted on few lenses (e.g., 7 lenses in case of current TDCOSMO results) and will transform the measurements 
and approaches in the domain of time-delay cosmography. The first step in utilizing these lenses is to discover them 
in large datasets. The next step is to acquire all the necessary follow-up information, from monitoring data for a 
time-delay measurement, high-resolution imaging, to spectroscopic information about the source and lens redshift 
as well as velocity dispersion of the deflector. This step is going to be challenging with limited resources and there 
needs to be made decisions which lenses being excessively followed-up and which ones left aside. Some lenses might 
require less substantial follow-up in case where Rubin light curves are good enough for a time-delay measurement, 
and or where high-resolution and sufficiently high signal-to-noise ratio data exists from wide field space surveys, such 
as Euclid or Roman. 


The key to assess the need for follow-up and on which lenses to spend it, is to what extend these datasets impact 
the precision on Hp. Follow-up decisions, besides the limited resources, are currently also impacted by the accessible 
to adaptive optics (AO) coverage. With next-generation AO instrumentation on both hemispheres, we expect a full 
sky coverage of instrumentation that allows the community, at least from a technical view point, to target every 
single gravitational lens on the sky. 


The dominant uncertainty in the current measurement of the Hubble constant with strong gravitational lensing 
time delays is attributed to uncertainties in the mass profiles of the main deflector galaxies. There are several 
independent avenues of data available in the near future to approach a 1% measurement of Ho that we focus in this 
section. 


Spatially resolved kinematics of the deflector galaxy with the next generation space (JWST, Gardner et al., 2006) 
and ground-based (ELT’s) instruments provides precise measurement of the kinematics and have the ability to break 
the mass-anisotropy degeneracy, a currently limiting systematic when using integrated kinematic measurements. 
Birrer and Treu (2021) forecasts that with 40 time-delay lenses with exquisite spatially resolved kinematics, a 1.5% 
precision on Ho can be achieved without relying on mass-density profile assumptions to break the MST, as shown 
in the left panel of Fig. 19 (see also, e.g., Yildirim et al., 2020, 2021). Resolved spectroscopy can also be employed 
on non time-delay lenses without bright and contaminating quasar images, which can further improve the kinematic 
measurement precision and enlarge the dataset (Birrer et al., 2020; Birrer and Treu, 2021). 


Standardizable magnifications with gravitationally lensed supernovae (glSNe) provide another promising avenue 
to constrain the MST in the near future with the onset of Rubin. As reported in the left plot of Fig. 19, Birrer et al. 


l2using wider priors on mass and concentration than earlier HOLiCOW/SHARP/STRIDES measurements 
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Spatially resolved kinematics Standardizable magnifications 


fm 40 TDCOSMO-5%: My = 70.0742 km s~? Mpe~? 
E 40 TOCOSMO+0-8U (5%): Mo = 70.07} 4 km s~? Mpe~ 
Mm 40 TDCOSMO+AO-IFU (5%): He = 7 sot Mpc“! 


E 40 TOCOSMO+ELTAFU (3%):He = 70.042} km s-t Mpc-t ~10 years LSST 


@ Pantheon SNe sample 
MH Roman SNe sample 


| 


= realiste microlensing 
==- extreme microlensing 
—-*= no microlensing 


1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
i 

a 
1 
1 
1 


| 


0 50 100 150 200 250 300 


© AV A® a D © oH AT QO9 4.5.9 
SVN TT SY PLM ARE 0? 52 4? 9 


N: 
Ho A aani number of strongly lensed supernovae of Type la 


Figure 19: Forecast for Ho measurements in the near future with the upcoming ground- and space-based 
facilities. Left: Spatially resolved kinematics measurements of a sample of 40 time-delay lenses enable a precision 
on Ho of 1.5% (Figure adopted from Birrer and Treu (2021)). Right: Standardizable magnification measurements 
of ~144 gravitationally lensed supernovae enable a precision on Ho of 1.5% (Figure adopted from Birrer et al. 
(2021)). Both approaches do constrain the MST with independent observations. 


(2021) provides a forecast with glSNe in constraining Ho independently of stellar kinematics. They conclude that 
the standardizable nature enables a 1.5% Ho measurement with a 10 years Rubin survey. On the discovery, expected 
number of glSNe, the challenges of following them up, and the caveats of micro-lensing, we refer to Goldstein et al. 
(2018); Foxley-Marrable et al. (2018); Wojtak et al. (2019); Goldstein et al. (2019); Huber et al. (2021); Birrer et al. 
(2021). 


In summary, in the next decade with an increasing of the number of lenses and the improved data quality, a 
~1% measurement of the Hubble constant becomes feasible, when also major efforts in the validation and possible 
covariant systematics are being invested in. 
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3.6 Cosmography with Cluster Strong Lensing 


While Sect. 3.5 considers strong lensing effects produced by galaxy-scale lenses on intrinsically variable sources, this 
section focuses on much more massive structures in the Universe, galaxy clusters. In particular, we illustrate the 
principles of cluster strong lensing cosmography. 


3.6.1 Basic idea and equations 


For simplicity, we use the thin-screen approximation, i.e. we assume that the lens total mass distribution is confined 
on a plane, called the lens plane. In addition, we assume a single lens plane. The equations described in Sect. 3.5.1 
remain valid in this context. The measurement of relative time-delays between the multiple images of intrinsically 
variable sources lensed by galaxy clusters can be used to constrain cosmological parameters such as Ho. 


Due to their large mass, galaxy clusters can have large cross sections for strong lensing. The size of these cross 
sections depends on several properties of the lenses, including their total mass, dynamical state, ellipticity and 
asymmetry (Torri et al., 2004; Hennawi et al., 2007; Meneghetti et al., 2007, 2010). It is not uncommon that massive 
clusters strongly lens several tens of background sources simultaneously (e.g., Postman et al., 2012; Lotz et al., 
2014; Coe et al., 2019; Steinhardt et al., 2020; Caminha et al., 2017b, 2019; Lagattuta et al., 2019; Bergamini et al., 
2021b). In this case, additional constraints on the cosmological parameters can be set, even with sources that are 
not intrinsically variable and for which relative time-delays cannot be measured. 


Equations 48, 49, and 50 show that the difference between the observed and intrinsic positions of a source whose 
light is deflected by a gravitational lens is the product of two factors. The first factor is the deflection angle G(8), 
which is proportional to the two-dimensional gradient of the integral of the lens Newtonian gravitational potential 
along the line-of-sight: 

2 
â(0) = ZV f 0a ‘ (72) 

c 
Thus, the deflection angle depends on the lens total mass distribution. 

The second factor is the ratio between the angular diameter distances Das and Dg. In a flat cosmological model, 

the angular diameter distance to redshift z is given by: 


oc 1 | ý dz 
~ Holtz Jo Rall +2} +0- m)l +232 ? 
where w is the EoS parameter for the dark energy. Thus, the angular diameter distance depends on the values of 


cosmological parameters, such as Ho, Qm, and w. The ratio of two angular diameter distances does not depend on 
Ho. 


Da(z) (73) 


For simplicity, we consider a circular symmetric lens and choose to measure the angular positions 0 and 6 with 
respect to the lens center. The deflection angle for any circular-symmetric mass distribution is: 


_ 4GM(|0]) 


4(0) = Sper? (74) 


Inserting Eq. 74 into Eq. 49, we obtain that the image of a source perfectly aligned with the lens and the observer 
(B = 0) is a ring, whose angular size is: 


4GM|On(za,%)] Das 
Oe (Za, Zs) = y | Blea al ip i (75) 


This radius is called Einstein radius. The mass M (0g) is the projected mass enclosed by the ring. 


In the case of two sources at redshifts zs and zs aligned with the lens and the observer, the ratio of the 
corresponding Einstein radii is given by: 


Op (zas 25,1) _ ‘ee Das(2a,25,1) _Ds(2s,2) (76) 
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Figure 20: Sensitivity of the family ratio to the values of the cosmological parameters Qm and wo. The left 
panel shows the family ratio for a lens at redshift zq = 0.5 in a flat ACDM cosmological model with Qm=0.3, 
w(t) = wo = —1, and Ho=70 km s-tMpc7?. We assume zs,1 = 1. The solid black curve describes how the 
family ratio varies as a function of the second source redshift, zs,2. The shaded blue region indicates the 95% 
prediction interval estimated by sampling the parameter plane Qy,-wo assuming uniform priors (Qmé€ [0.1, 1] and 
wo E€ [—2,—0.5]). The right panel shows the results of the Sobol’s sensitivity analysis. We show the first-order 
Sobol index for both parameters as a function of zs,2. 
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Figure 21: Similar to Fig. 20, but showing the sensitivity of the time-delay distance, Dat(zs), to the values of 
the cosmological parameters Ho, Qm, and wo. 


The function: 
Das(za, 2s,1) Ds(Zs,2) 


Dg(2s,1) Das(Za; 25,2) 
is called the family ratio, and depends on the values of cosmological parameters, such as Qm and w. This result 
holds also in the case of sources not perfectly aligned with the lens and the observer, or for lenses whose total mass 
distribution is not circular. The general principle is that the relative positions of multiple image families depend 
both on the lens mass distribution and the family ratios. 


Bl a9; Zs,1; 2,2) = (77) 


In the left panel of Fig. 20, we show the family ratio for a lens at redshift zg = 0.5 in a flat ACDM cosmological 
model with Qn= 0.3, w(z) = wo = —1, and Ho= 70 km s~!Mpc~'. We assume Zs,1 = 1. The solid black curve 
describes how the family ratio varies as a function of the second source redshift zs,2. The shaded blue region indicates 
the 95% prediction interval estimated by sampling the parameter plane Q;,-wo using the Saltelli’s scheme (Saltelli, 
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Figure 22: Degeneracy between the parameters wo, and Qm, derived by fitting 45 family ratios (obtained by 
combining 10 multiply imaged sources uniformly distributed between z = 1 and z = 6). The dashed lines indicate 
the true values of the cosmological parameters. 


2002). We assume uniform priors on the cosmological parameters, with Qm€ [0.1, 1] and wo € [—2, —0.5]. Performing 
a Sobol’s sensitivity analysis (Sobol’, 2001; Saltelli et al., 2010), we find that ~ 60— 70% of the variance of the family 
ratio is due to the variance of Qm, as indicated by the first-order Sobol index $1 plotted in the right panel. The 
contribution of the wo variance amounts to ~ 10 — 25%, while the remainder of the variance is due to second-order 
interactions between Qm and wo. Thus, the family ratio is primarily sensitive to Qm, but it is also sensitive to the 
dark energy equation of state. 


As shown in Fig. 21, performing a similar analysis for the time-delay distance (assuming again flat priors on 
the cosmological parameters, with Hoe [50,100] km s~'Mpc~', Qm€ [0.1,1] and wo € [—2,—0.5]), we find that 
the variance of this quantity is mostly contributed by the variance of Hp (~ 90 — 95%), while the sensitivity to 
other cosmological parameters is much weaker. Thus, these results suggest that the family ratio and the time-delay 
distance are highly complementary cosmological probes. 


The existing degeneracy between the parameters wo and Qm estimated from the family ratios of several multiply 
imaged sources is illustrated in Fig. 22. We assume to fit 45 family ratios obtained by combining 10 multiply imaged 
sources uniformly distributed in redshift between z = 1 and z = 6. The confidence contours (at 1, 2, and 3c) do not 
account for the uncertainties related to lens modeling (see the discussion in Sect. 3.6.5). As we see, the degeneracy 
is strong: we obtain similar family ratios in cosmologies with high value of wo and low value of Qm and vice-versa. 
Breaking the degeneracy requires increasing the number of constraints by either accumulating a larger number of 
multiple image families by means of deeper observations of single clusters or stacking multiple lenses (Gilmore and 
Natarajan, 2009). 


3.6.2 Sample selection 


Currently, only five lens galaxy clusters with multiple images of time-varying sources (3 QSOs and 2 SNe) and 
measured time-delays are known. Systematic searches for gravitationally lensed quasars over a range of angular 
separations have ramped up in the last 20 years with the availability of the Sloan survey. The SDSS Quasar Lens 
Search (SQLS, Oguri et al., 2006) used a combination of morphological and color selection criteria applied to a SDSS 
sample of spectroscopically confirmed QSOs to find over 200 candidate strongly lensed QSOs. A few of these were 
found with angular separations exceeding 10”, characteristic of group-cluster scale lenses; remarkable examples are 
SDSS J1004+4112 (Inada et al., 2003), a five-image lensed QSO with a separation of ~15”, and SDSS J1029+2623 
(Inada et al., 2006), a three-image system with the largest known separation to date (~ 23”). The SLOAN Giant Arc 
Survey (Hennawi et al., 2008), a study largely based on visual inspection of strong lensing features around massive 
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clusters, has discovered SDSS J2222+2745 (Dahle et al., 2013), with six detected images of a QSO at z = 2.2 with 
a separation of ~ 15”. More advanced methods to search for multiply lensed quasars based on machine-learning 
techniques, which work directly on image cutouts using neural network pattern recognition methods, have been 
developed over the last years (e.g. Agnello et al., 2015; Petrillo et al., 2017; Metcalf et al., 2019; Cafiameras et al., 
2020), and are being applied to new wide-area optical surveys such as the Dark Energy Survey (Huang et al., 2020). 
Machine-learning methods have also recently been applied to search for strongly lensed quasars selected from Gaia 
catalogs, in combination with near-IR surveys to identify likely lenses (e.g. Stern et al., 2021). These techniques have 
initially been developed to discover lensed QSO systems with small separations (a few arcsec), however they can be 
easily extended to cluster-scale lenses. 


The Vera C. Rubin Observatory project (LSST Science Collaboration et al., 2009) will discover hundreds of new 
multiply imaged QSOs and SNe (a few tens of which will be lensed by galaxy clusters) and will measure their time- 
delays (Oguri and Marshall, 2010). The latter can require time consuming monitor campaigns, particularly in case 
of cluster-scale lenses. 


By assuming a singular isothermal sphere (SIS) profile for the total mass distribution of the lens (cluster), the 
time-delay between the two multiple images of the same background source can vary between 0 and a maximum 
value given by: 


4 
1+ za Da Das Da Das o 
Atsis,max = ge 32r?ogig = 127.5 (1 + 2a) (5 S a) (atts) yr, (78) 


where osgjg is the value of the effective velocity dispersion associated to the isothermal total mass profile. Alternatively, 


one can write: 
Dar 2 92 = Dat 
2 = | = 
Ly Ho 


D 
Atsis max = = 202, = ) 0.66 62. (arcsec) yr , (79) 


H 
where Da; is the time delay distance (Eq. 54), Ly = cH, * is the Hubble length, so that Da:;/LH <1, and Og the 
Einstein radius, ranging from a few arcsec to ~15” (Ho= 70 km s~!Mpc’ tis adopted). 


Cluster-scale multiply lensed quasars (e.g. Dahle et al., 2015; Fohlmeister et al., 2013), as well as the multiple 
images of SN Refsdal (Kelly et al., 2015) and SN Requiem (Rodney et al., 2021) do indeed show model-predicted 
and measured time-delays spanning from a few days to years and tens of years. 


3.6.3 Measurements 


To constrain cosmology using the strong lensing cosmography approach, one has to simultaneously fit as many strong 
lensing constraints as possible, using a model that incorporates the cluster mass distribution, and the cosmological 
parameters. This process is called lens inversion. There are two general classes of inversion algorithms. A first 
approach is called free-form, wherein the cluster is subdivided into a mesh on to which the lensing observables are 
mapped, and which is then transformed into a pixelized mass distribution. Other methods comprise parametric mod- 
els (e.g. Kneib et al., 1996; Jullo et al., 2007; Jullo and Kneib, 2009), wherein the mass distribution is reconstructed 
by combining clumps of matter on different scales. One or more large-scale mass components are used to describe 
the diffuse cluster dark matter halo. They are often positioned where the brightest cluster galaxies are located. The 
other cluster galaxies are used to trace the cluster substructure. Both the large and small scale mass components 
have density profiles given by analytic functions. 


Using either of these approaches, the cluster mass distribution is described by a set of parameters (which can 
be a set of pixel values or parameters describing the shape and density profiles of each mass clump). Let p be the 
totality of the parameters used to model the cluster mass distribution, and Peosmo the cosmological parameters we 
want to estimate. 


The strong lensing constraints are generally in the form of positions of multiple images of several sources. These 
images are identified based on the morphology and color similarities of the lensed features. Gravitational lensing 
conserves the source surface brightness, implying that several source properties (e.g., star forming regions, spiral 
arms, bulges, etc.) can be recognized in all their multiple images. The geometry of the lens, inferred from the 
spatial distribution of the cluster galaxies, is useful to find counter-images of a given source. Typically, the cluster 
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galaxies in the central regions of galaxy clusters are early-type galaxies, most of which can be recognized because they 
populate a red sequence in the color-magnitude diagram. More sophisticated methods to identify these galaxies and 
separate them from foreground and background sources also include deep-learning models trained using multi-band 
images (Angora et al., 2020). Thus, finding multiple images and cluster members requires high resolution multi-band 
imaging observations that only the Hubble Space Telescope (HST) can currently deliver. 


Candidate multiple images can be confirmed by verifying that they have similar spectra. Spectroscopy is also 
crucial to measure the redshifts of lenses and sources. Without redshifts it is impossible to convert angular scales 
into physically meaningful units. 


The multiple images of the same source form a family. Each family provides some constraints on the lens 
deflection field. Indeed, given a source at the intrinsic angular position 8, the positions of its images, 0;, satisfy 
the lens equation (Eq. 49). In the case of intrinsically variable sources, such as Supernovae or QSOs, we can derive 
additional constraints by measuring the relative time-delays between the multiple images. In the following equations, 
we assume that both positional and time-delay measurements are available for the lensing analysis. Nevertheless, we 
remark once more that the strong lensing cosmography approach can be used to estimate the values of cosmological 
parameters, such as those of Qm, Qae, and w, even without measuring time-delays. 


The cluster potential and the cosmological model are constrained simultaneously by maximizing the posterior 
probability distribution: 


P(9°rs om Ato?’ |p ae Pcosmo) xX P(p g Peosmol 6 ae Ates) P(p di Pcosmo) ? (80) 


where 9°>S and At°S are the observed positions and relative time-delays of the multiple images, respectively. The 
symbol — denotes the concatenation operator. The model likelihood is given by: 


ODS obs 1 
L(p Kas Peosmo|9 Be At p ) X exp [-5x°0 En pea) : (81) 
Since the datasets (positions and time-delays) are independent, the likelihood is separable. Thus the x? (p > Peosmo) 


function is the sum of two terms. The first quantifies the separation between the observed and the model-predicted 
multiple image positions: 
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where O9P§ and apres are the observed and model-predicted positions of the j-th multiple image belonging to the 
i-th family, N fam is the total number of multiple image families, and n; is the number of multiple images belonging 
to the i-th family. The uncertainty on the image positions, og, ;, is generally unknown. It depends not only on the 
effective resolution of the observations (i.e. the pixel scale and the size of the Point-Spread-Function), but also on 
several properties of the lens not directly accounted for in the lens model (such as unseen substructures in the cluster 
or along the line-of-sight or asymmetries of the dark matter distribution). Generally, this uncertainty is scaled to 
obtain a value of reduced x? of ~ 1 (Bergamini et al., 2021b). 


The second term quantifies the difference between the observed and model-predicted relative time-delays: 
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where N fam,ta is the number of families of multiple images for measured time-delays, ni ta is the number of multiple 
images of the i-th family (note that this implies n; ta — 1 relative time-delays measurements after choosing the ni ta-th 
image as reference), Aths and AG are the observed and model-predicted relative time-delays of the j-th multiple 
image belonging to the i-th family, and oas, , is the error of the time-delay aa, 

If the lens model and the cosmology are constrained by the positions of Nf! = Sitam ni observed multiple 
images and Nit = POME (n; sq — 1) relative time-delay measurements, by defining Npar as the total number of 
model free parameters, we can write the number of degrees-of-freedom (DoF) of the lens model as: 


DoF = 2 x Ni NE — 2 x Njam — Npar = Neon — Npar 2 (84) 
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The term 2 x Nyam stems from the fact that the unknown positions of the N fam background sources (2 coordinates 
for each of them) are additional free parameters of the model. Thus, Neon is the effective number of available 
constraints. 


3.6.4 Systematic effects 


As described in Sect. 3.5, the strong lens time-delay method has been successfully utilized with quasars lensed by 
galaxies. Several studies (e.g., Birrer et al., 2016; Treu and Marshall, 2016; Suyu et al., 2017) have recognized that, 
in addition to the spectroscopic redshifts of the lens and the source, the most important steps toward accurate and 
precise cosmological measurements are: i) precise time-delays, ii) high-resolution images of the lensed sources, tii) 
precise stellar kinematics of the lens galaxy, and iv) detailed information about the lens environment. Long-term 
monitoring campaigns of lensed quasars with optical, notably by the COSMOGRAIL collaboration (e.g., Tewes et al., 
2013c; Courbin et al., 2018b), or radio (e.g., Fassnacht et al., 2002) telescopes, together with advances in light-curve 
analyses (e.g., Tewes et al., 2013a; Hojjati et al., 2013), have provided precise time-delays. To convert these delays 
to cosmologically relevant quantities, an accurate lens mass model is needed, particularly concerning its radial total 
mass density profile. Steeper profiles yield larger Fermat potential differences between two images, resulting in 
larger inferred values of Hp (Wucknitz, 2002; Kochanek, 2002). In addition to the main lens, there could be other 
mass contributions, associated to galaxies belonging to the same group/cluster of the main lens or to line-of-sight 
structures. If not properly accounted for, this term represents an important source of systematic error, the so-called 
mass-sheet degeneracy (Falco et al., 1985; Schneider and Sluse, 2013), in the model prediction of the time-delays. 
This clarifies why the extended reconstruction of multiple images, the use of independent mass diagnostics (e.g., 
stellar dynamics; see Treu and Koopmans, 2002) for the main lens, and a detailed characterization of its environment 
(i.e., points ii), iii), and iv) listed above) are so relevant to a very accurate total mass model of the lens, thus to the 
success of this cosmological probe (e.g., Suyu et al., 2014; Birrer et al., 2016; McCully et al., 2017; Rusu et al., 2017; 
Sluse et al., 2017b; Shajib et al., 2018; Tihhonova et al., 2018). 


Despite being more complex than that of an isolated galaxy, the strong lensing modeling of a galaxy cluster 
presents some advantages. First, the identification of several multiple images, some of which might be very close 
to the cluster center and radially elongated, provides important information about the slope of the cluster total 
mass density profile (see, e.g., Caminha et al., 2017b). Second, the frequent observations of pairs of angularly close 
multiple images from sources at different redshifts (see, e.g., Grillo et al., 2016) locate very precisely the positions 
of the lens tangential critical curves, thus resulting in precise calibrations of the projected total mass of the cluster 
within different apertures. These facts reduce the need to rely on different total mass diagnostic, such as stellar 
dynamics in lens galaxies. Moreover, the large number of secure and spectroscopically confirmed multiple images 
observed in galaxy clusters allows one to choose the best mass model among the different tested ones (i.e., the best 
reconstruction of the cluster mass components; see Grillo et al., 2015, 2016), according to the value of the minimum 
y?. As shown in Grillo et al. (2015, 2016), it is remarkable that all considered mass models lead to statistical and 
systematic relative errors of only a few percent for the cluster total mass. Very good agreement has also been found 
with the measurements from independent total mass diagnostics, e.g. those from weak lensing, dynamical and and 
X-ray observations (see, e.g., Grillo et al., 2015; Balestra et al., 2016; Caminha et al., 2017b). In addition, in a 
galaxy cluster, the modeling of its different mass components (i.e., extended dark-matter haloes, cluster members, 
and possibly hot gas; see, e.g., Bonamigo et al., 2017, 2018; Annunziatella et al., 2017) provides a good first-order 
approximation of possible additional lensing effects (i-e., of the environment) in the regions adjacent to where the 
time-delays can be measured. Some recent studies have exploited kinematic data for the cluster members to model 
more realistically their total mass contribution through scaling relations with non-zero scatter or information from 
the Fundamental Plane relation (e.g., Bergamini et al., 2021a; Granata et al., 2021). In summary, if extensive multi- 
color and spectroscopic information is available in lens galaxy clusters, robust mass maps can be constructed (see 
Grillo et al., 2015; Caminha et al., 2017a; Lagattuta et al., 2017). The feasibility of using the measured time-delays 
of the first multiply-imaged and spatially-resolved supernova (SN “Refsdal’) for measuring Ho with high statistical 
precision has been demonstrated (Grillo et al., 2018), and a full systematic analysis has been performed (Grillo 
et al., 2020). Adding to the model a uniform sheet of mass at the cluster redshift or a cluster main mass density 
profile with a variable slope (optimized together with all the other model parameters), result in Ho probability 
distribution functions that are just slightly broader than those without these extra model parameters. Based on our 
previous studies (see, e.g., Chirivi et al., 2018, on the influence of mass structures along the line of sight on lensing 
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modeling), systematic effects in lens galaxy clusters seem to be controlled to a level similar to or even lower than 
the statistical uncertainties, given the exquisite datasets in hand and soon becoming available, making time-delay 
cluster cosmography a potentially very competitive method. 


Finally, we remark that in any cluster strong lensing model the values of the cosmological parameters and those 
defining the mass distribution of the lens are not independent, and they cannot be considered separately in obtaining 
model-predicted quantities (e.g., the time-delays, positions, and flux ratios of the multiple images). Therefore, 
the results obtained by simplistically keeping the cluster mass distribution fixed are likely to underestimate the 
uncertainty on the values of the cosmological parameters, and possibly introduce biases, since they neglect the 
covariance between the cosmological and cluster mass model parameters (see, e.g., Acebron et al., 2017). Zitrin et al. 
(2014) confirm that the values of the cosmological parameters are biased when they are estimated by applying a 
fixed cluster mass distribution for correcting the luminosity distances of lensed SNe Ia. 


3.6.5 Main results and forecasts 


As detailed in Sect. 3.6.1, time-delay distances are primarily sensitive to the value of Ho, and more mildly to those 
of other cosmological parameters. In galaxy clusters, usually showing several multiple images, different values of 
the family ratio (see Eq. 77) can be used at the same time to add constraints on the values of the cosmological 
matter (Qm) and dark-energy (Qae) density parameters, defining the global geometry of the Universe. In general, 
the cosmological contribution is difficult to disentangle from that associated to the total mass of a lens, because 
of a strong degeneracy between the two. However, when a significant number of multiply lensed sources (with 
spectroscopic redshifts spanning a wide range) is present, valuable information about the cosmological parameters 
can be inferred. This technique has been applied without time-delay measurements in the galaxy clusters Abell 2218 
(Soucail et al., 2004), Abell 1689 (Jullo et al., 2010) and, more recently, RXC J2248.7—4431 (Caminha et al., 2016), 
and in combination with time-delay measurements in MACS J1149.5+2223 for the first time (see Fig. 3 of Grillo 
et al., 2018). 


In Caminha et al. (2016), by exploiting the observed positions of 47 multiple images, 24 of which spectroscopically 
confirmed, from a total of 16 background sources over the redshift range 1.0-6.1, a comprehensive study of the total 
mass distribution of the galaxy cluster RXC J2248.7—4431 with a set of high precision strong lensing models has 
resulted into measurements (from lensing only) of the values of Qm and wo with, respectively, between ~ 40% and 
~ 60%, depending on the adopted cosmological model, and ~ 30% (1c) statistical uncertainties. In Caminha et al. 
(2021), thanks to a sample of five detailed cluster total mass models, it has been demonstrated that strong lensing 
measurements of the values of the cosmological parameters are complementary and in good agreement with the 
estimates from the CMB, BAO, and SNe Ia. In particular, the strong lensing cosmographic analysis has allowed to 
improve the constraints from the CMB on the values of Qm and wo (in a flat wCDM model) by factors of 2.5 and 
4.0, respectively. 


By using the observed positions of 89 multiple images, with extensive spectroscopic information, from 28 back- 
ground sources and the measured time-delays between the images S1-S4 and SX of SN Refsdal, Grillo et al. (2018) 
have inferred blindly the values of Hp and Qm with relative (lø) statistical errors of, respectively, 6% (7%) and 
31% (26%) in flat (general) cosmological models, assuming a conservative 3% uncertainty on the final time-delay of 
image SX and, remarkably, no priors from other cosmological experiments. Moreover, by investigating separately 
the impact of a constant sheet of mass at the cluster redshift (see Fig. 24), of a power-law profile for the mass 
density of the cluster main halo and of some scatter in the cluster member scaling relations, Grillo et al. (2020) have 
found that, in a flat ACDM cosmology, these systematic effects do not introduce a significant bias on the inferred 
values of Ho and Qm, and that the statistical uncertainties dominate the total error budget: a 3% uncertainty on 
the time-delay of image SX translates into approximately 6% and 40% (including both statistical and systematic 
lo) uncertainties for Hp and Om, respectively. They have also presented the interesting possibility of measuring the 
value of the EoS parameter w of the dark energy density with a 30% uncertainty (see Fig. 23). 


By comparing different results of the strong lens time-delay method, with SN Refsdal in MACS J1149.5+2223 
(Grillo et al., 2020) and with lensed quasars in the galaxy-scale systems of the HOLiCOW program (Suyu et al., 2017), 
we can conclude that i) the relative error on the inferred value of Ho from a single (galaxy or cluster) strong lensing 
system is similar (mean value of 6.4% in Fig. 2 of Wong et al., 2020), iz) in a single lens cluster, there is the additional 
possibility of estimating the value of Qm (and w), thanks to the observations of different multiple-image families with 
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spectroscopically confirmed redshifts and to the measurements of the time-delay value between the multiple images 
of intrinsically variable sources, and iii) the observed positions of many spectroscopic multiple images (some of which 
are key to locating the lens tangential and radial critical curves) provide precise calibrations of the different mass 
components (i.e., extended dark-matter halos, cluster members, and hot gas) considered in the model of a galaxy 
cluster and, thus, also a good approximation of the effect of the “environment” where the time-delays are measured. 


In particular, it has been tested on models with either the entire sample of 89 multiple images from 28 sources 
at different redshifts, or only the 63 multiple images from SN Refsdal and its host, all at the same redshift, that 
in the former case the effect of the so-called “mass-sheet degeneracy” is significantly reduced (Grillo et al., 2020). 
More quantitatively, this has produced an approximately 9% difference in the median value of Ho (and of Qm), and 
a remarkable reduction by a factor of more than 3, from ~ 21% to ~ 6%, for its uncertainty (from ~ 63% to ~ 40% 
for the uncertainty on Nm). 


In each lens galaxy cluster, the combination of the positions of several tens of spectroscopically confirmed multiple 
images and of one or more time-delays between the multiple images of a lensed QSO or SN will allow one to determine 
the lens Fermat potential differences with a ~ 5% uncertainty (including both the statistical and systematic errors, 
as shown by Grillo et al., 2018, 2020; Acebron et al., 2021). The planned modeling of the extended surface brightness 
distributions and kinematic maps of some of multiple images will very likely reduce this uncertainty below 5%. 
For each time-varying source lensed by a galaxy cluster, the longest time-delay between its multiple images will be 
measured with a ~ 2% error (as obtained so far for the known systems). This will result in a <6% total uncertainty 
on the value of Ho estimated from a single lens cluster. The dataset that are already available for the first three lens 
clusters will already provide a combined ~3% uncertainty on Ho, that will be reduced to <2%, when a sample of 
~ 10 lens clusters will be completed, thanks also to the new data from the Rubin survey. 
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Figure 23: Confidence regions (at 1 and 2c levels) and median values (crosses) of Ho, w and Qm obtained 
from the lensing models of SN Refsdal (adapted from Grillo et al., 2020). Dotted lines corresponding to the 16th 
and 84th percentiles for each parameter. A time-delay between SX and S1 of 345 days with a 2% relative error 
is adopted. Flat wCDM models (Qm+Qae=1) with uniform priors on the values of the cosmological parameters 
(HoE [20, 120] km s~'Mpc™!, Qmé€ [0,1] and wo € [-2,0]) are considered. Constraints on the matter density and 
dark energy EoS parameters are mostly due to the angular diameter distance ratios (Eq. 77), whereas those on 
the Hubble constant are mainly driven by optimizing the measured time-delay of SN Refsdal with the blind mass 
model by Grillo et al. (2016). 
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Figure 24: The impact of the mass sheet degeneracy (MSD) on the lensing model of SN Refsdal, where ko 
is the value of the convergence of a constant sheet of mass at the cluster redshift (Grillo et al., 2020). In red: 
confidence contour levels at 1 and 2ø for Ho and ko obtained using all (89) multiple images at different redshifts. 
A time-delay between SX and S1 of 345 + 10 days is adopted. In this case the best fit model yields a vanishing 
mass-sheet (ko = Cee. see vertical dotted line). In gray: confidence regions obtained from a model using 
only those images (63) belonging to SN Refsdal and its host, all at z = 1.49. The dashed-dotted line illustrates 
the theoretical effect of the MSD (Schneider and Sluse, 2013). Flat ACDM models (Qm+Qa=1) with uniform 
priors on the values of the cosmological parameters (Ho€ [20, 120] km s~'Mpc~!and Qm€ [0, 1]) and on the value 
of ko (€ [—0.2, 0.2] or [—0.5,0.5]) are considered. 
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3.7 Cosmic Voids 


The largest discernible structures of the Universe make up the so-called cosmic web. It represents a network of 
compact nodes that are connected by filaments and walls of lower density (Zeldovich, 1970). The remaining space is 
taken up by cosmic voids, extended regions of very low matter content (e.g., Zeldovich et al., 1982; Bertschinger, 1985; 
van de Weygaert and van Kampen, 1993). The nodes are occupied by groups and clusters of galaxies, which makes 
them the most luminous and thus best identifiable individual structures at cosmological distances. The contrary is 
the case for voids, which host the least luminous galaxies in the cosmos and have only been discovered in the late 
70’s (Gregory and Thompson, 1978; Joeveer et al., 1978). A systematic identification of voids not only requires a 
complete sampling of their boundaries, consisting of filaments and walls, but also the sensitivity to detect the faintest 
galaxies in their interiors. This has only recently become feasible with the advance of wide and deep redshift surveys 
that are able to reveal the three-dimensional structure of the cosmic web in great detail (e.g., see Pan et al., 2012; 
Sutter et al., 2012b; Micheletti et al., 2014; Mao et al., 2017b; Sánchez et al., 2017; Achitouv et al., 2017; Brouwer 
et al., 2018; Hawken et al., 2020, for some of the first void catalogs obtained from SDSS, VIPERS, BOSS, DES, 
6dFGS, KiDS, and eBOSS). 


Since then, void catalogs of ever-growing size have been compiled and analyzed to tackle unanswered questions in 
various fields of cosmology and astrophysics. For example, voids can been used to study environmental effects in the 
formation and evolution of galaxies (e.g., Hoyle et al., 2005; Patiri et al., 2006; Kreckel et al., 2012; Ricciardelli et al., 
2014a; Habouzit et al., 2020; Panchal et al., 2020), to investigate the nature of gravity with the motivation to find 
modifications to the general theory of relativity (e.g., Clampitt et al., 2013; Spolyar et al., 2013; Zivick et al., 2015; 
Cai et al., 2015; Barreira et al., 2015; Hamaus et al., 2015; Achitouv, 2016; Voivodic et al., 2017; Falck et al., 2018; 
Sahlén and Silk, 2018; Baker et al., 2018; Paillas et al., 2019; Davies et al., 2019; Perico et al., 2019; Alam et al., 2020; 
Contarini et al., 2021; Wilson and Bean, 2021), or to reveal unknown properties of the standard model ingredients 
in cosmology, namely its initial conditions (Chan et al., 2019), dark energy (e.g., Lee and Park, 2009; Biswas et al., 
2010; Lavaux and Wandelt, 2012; Sutter et al., 2012a; Bos et al., 2012; Hamaus et al., 2014a; Pisani et al., 2015a; 
Pollina et al., 2016; Verza et al., 2019), dark matter (e.g., Leclercq et al., 2015; Yang et al., 2015; Reed et al., 2015; 
Baldi and Villaescusa-Navarro, 2018), and neutrinos (Massara et al., 2015; Banerjee and Dalal, 2016; Sahlén, 2019; 
Kreisch et al., 2019; Schuster et al., 2019; Zhang et al., 2020; Bayer et al., 2021; Kreisch et al., 2021). It is the 
under-dense character of voids that makes them particularly sensitive to homogeneous or diffuse components of our 
Universe, such as dark energy and neutrinos. For example, dark energy dominates the matter-energy budget inside 
voids much earlier than in the cosmos as a whole. Thanks to their small mass, neutrinos can freely stream into the 
deep interiors of voids, while baryons and dark matter are mostly restricted to their boundaries due to gravitational 
interaction. Finally, screening mechanisms that efficiently hide possible deviations from general relativity in regions 
of high density or deep gravitational potential are not effective inside voids. 


In order to encompass such a wide range of topics, various void-related observables have been considered. This 
includes cross-correlations with the CMB, which provide detections of the integrated Sachs—Wolfe effect (ISW, e.g., 
Granett et al., 2008; Ilić et al., 2013; Cai et al., 2014; Planck Collaboration et al., 2014b; Nadathur and Crittenden, 
2016; Kovacs et al., 2019, 2021) and of the Sunyaev—Zeldovich (SZ) effect (Alonso et al., 2018), or correlations 
with the distorted shapes of galaxies, revealing the matter content of voids via the gravitational lensing effect (e.g., 
Melchior et al., 2014; Clampitt and Jain, 2015; Gruen et al., 2016; Sanchez et al., 2017; Cai et al., 2017; Brouwer 
et al., 2018; Fang et al., 2019; Vielzeuf et al., 2021; Jeffrey et al., 2021). However, voids may also serve as cosmological 
probes themselves, because their dynamics are governed by the same physical laws that describe the evolution of 
the Universe as a whole. This enables us to predict their properties from first principles, and to compare these 
predictions with observations in order to constrain cosmological models. 


3.7.1 Basic idea and equations 


In this section, we discuss two of the most studied observables that have been investigated for cosmological applica- 
tions with voids so far: the void size function and the void density profile (or void-galaxy cross-correlation function). 
These two observables are affected by the so-called Alcock and Paczynski (1979) (AP) effect (e.g., Ryden, 1995; 
Sutter et al., 2012a, 2014d; Hamaus et al., 2014a, 2016; Mao et al., 2017a; Correa et al., 2019; Endo et al., 2020; 
Nadathur et al., 2020; Paillas et al., 2021) and by redshift-space distortions (RSD) (e.g., Ryden and Melott, 1996; 
Padilla et al., 2005; Paz et al., 2013; Pisani et al., 2015b; Hamaus et al., 2015, 2017; Cai et al., 2016; Chuang et al., 
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2017; Hawken et al., 2017, 2020; Achitouv, 2019; Aubert et al., 2020; Correa et al., 2021, 2022), which themselves 
carry cosmologically relevant information. For other methods that employ voids as cosmological probes, such as 
their pairwise clustering statistics on large scales (e.g., Hamaus et al., 2014c,a; Chan et al., 2014; Zhao et al., 2016; 
Chuang et al., 2017; Lares et al., 2017b; Voivodic et al., 2020), the associated baryon acoustic oscillation (BAO) 
feature (e.g., Kitaura et al., 2016; Liang et al., 2016; Chan and Hamaus, 2021), or the velocity statistics of voids 
(e.g., Sutter et al., 2014a; Ruiz et al., 2015; Lambas et al., 2016; Ceccarelli et al., 2016; Wojtak et al., 2016; Lares 
et al., 2017a), we refer the reader to the provided references. 


3.7.1.1 Void size function 


The void size function dn(R, z)/dR specifies the number density of voids of a given size R at redshift z. It is also 
known as void abundance. One can think of it in analogy to the cluster mass function dn(M, z)/dM, with the 
advantage of the void size being a directly observable quantity. In contrast, the cluster mass M can in practice 
only be related to other observables, such as richness or X-ray luminosity. The void size function has already been 
measured in current data (see Fig. 25), but has not yet been used to extract cosmological constraints (however, see 
Sahlén et al., 2016, for constraints from extreme-value statistics of voids). The increase in expected void numbers 
from upcoming surveys in the next decade and the strong modeling activity performed on simulations will soon allow 
first applications to observational data (Pisani et al., 2019). Theoretical models for the void size function allow us 
to predict void numbers in the dark matter distribution from first principles (e.g., Sheth and van de Weygaert, 2004; 
Furlanetto et al., 2006; Platen et al., 2007; Paranjape et al., 2012; Jennings et al., 2013; Pisani et al., 2015a). By 
accounting for tracer bias, it is possible to relate those predictions to observable voids in the tracer distribution 
(Pollina et al., 2016; Ronconi and Marulli, 2017; Ronconi et al., 2019; Contarini et al., 2019), thereby providing 
estimates of expected void numbers in large-scale structure surveys. First and foremost, predicting void numbers 
is an important task, necessary to perform accurate forecasts for other probes relying on the statistics of voids. 
However, it turns out that the void size function is an extremely sensitive probe of cosmology in itself: by counting 
voids of different size in surveys one can obtain constraints on the dark energy EoS (Pisani et al., 2015a; Verza et al., 
2019), the presence of massive neutrinos (Sahlén, 2019; Kreisch et al., 2019; Schuster et al., 2019; Kreisch et al., 
2021), and modified gravity (Clampitt et al., 2013; Lam et al., 2015; Cai et al., 2015; Zivick et al., 2015; Sahlén et al., 
2016; Contarini et al., 2021). 


The most common setup to obtain predictions relies on the excursion-set formalism (Bond et al., 1991), applied 
to the hierarchical evolution of cosmic voids. It has first been developed by Sheth and van de Weygaert (2004) and 
was later extended by Jennings et al. (2013). Excursion-set theory provides predictions for void numbers based on 
spherical fluctuations in the initial (Lagrangian) density field. It calculates their conditional first-crossing distribution 
finc(c) as a function of the root mean square matter fluctuations 7, smoothed on a scale Ry. A fluctuation becomes 
a void when its Lagrangian density contrast 6”, filtered on the scale Ry, reaches the void formation threshold ô} 
without crossing the collapse threshold ô} on a scale larger than Ry. The thresholds are determined via the nonlinear 
evolution of a spherically symmetric top-hat fluctuation (Icke, 1984), the moment of shell crossing conventionally 
defines the formation of a void (Bertschinger, 1985; Blumenthal et al., 1992). The void size function in Lagrangian 
space is then given by: 

dn _ fmol) dlno™t 


= 85 
dln RL V(Rı) dln RL ( ) 
where V(RL) = 47 R3 /3, and the first-crossing distribution is (Sheth and van de Weygaert, 2004): 
oo Ee 
finc(o) = 25 eo? jng? sin (jnD) , (86) 
j=1 
with: 
D id (87) 
= —— _ L=—o. 
be + leE] lðxl 


The label L indicates all quantities that are evaluated following linear theory in Lagrangian space. To ensure volume 
conservation between the linear and nonlinear density field, Jennings et al. (2013) impose: 


V(R)dn = V(Rx)dnz| rir) . (88) 
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Figure 25: Left: Void size function from the final BOSS data in the redshift range 0.20 < z < 0.75. Right: 
Projected void density profile (void-galaxy cross-correlation function, red wedges) from the final BOSS data, 
and its real-space counterpart after deprojection (green triangles). The redshift-space monopole of the density 
profile (blue dots) is shown along with its best-fit model (blue solid line). Images reproduced with permission 


from Hamaus et al. (2020), copyright by IOP Publishing. 


Together with the equality dln R = dln Rt, which applies for the spherical top-hat model, one obtains the final 
expression for the so-called Vdn model, as extension from the original Sheth and van de Weygaert model: 


dn fing(a) dlno™! (89) 
daR V(R) dnR `’ 


However, in order to apply this model to data it is necessary to consider the complicating fact that in practice, 
voids are found in the distribution of tracers of the matter density field, that are typically galaxies. Moreover, 
the structures identified by a shape-agnostic void finding algorithm are not the idealistic spherically symmetric and 
isolated objects assumed in the theoretical model (e.g., Platen et al., 2007; Neyrinck, 2008; Sutter et al., 2015, see 
Sect. 3.7.2). To align the theory with observations, two important steps need to be taken into account. Firstly, the 
measured properties of real voids need to be linked with the idealistic top-hat model, such that their size and depth 
agree. For example, this can be achieved by identifying a sphere of radius R around the void center, which yields a 
given density threshold ôy. The spherical top-hat model suggests using 6, ~ —0.8 at the moment of shell crossing 
as a natural choice (Bertschinger, 1985; Blumenthal et al., 1992), but in principle the model keeps its validity with 
any other value (Jennings et al., 2013; Verza et al., 2019). Secondly, density fluctuations in the tracer distribution 
are biased with respect to the matter density field (see Sect. 3.7.4). Therefore, a model for tracer bias needs to be 
incorporated in the theoretical formalism to predict the observable void size function (Ronconi et al., 2019; Contarini 


et al., 2019, 2021). 


3.7.1.2 Void density profile 

Apart from their size, voids are characterized by their unique composition and geometry. While these properties 
may vary significantly from one void to another, they are more well-defined in an ensemble average sense. For 
example, in a statistically homogeneous and isotropic universe the average density profile of voids exhibits some 
universal characteristics: an extended under-dense core and a steep density run towards the void boundary (e.g., see 
Ricciardelli et al., 2014b; Hamaus et al., 2014b, and Fig. 25). The boundary itself features an over-dense ridge whose 
amplitude diminishes for increasingly large voids (e.g., Sheth and van de Weygaert, 2004; Ceccarelli et al., 2013). 
These characteristics can be parameterized by analytical fitting formulae for the isotropic void density profile. For 


example, one well-explored expression is given by: 
l= rita) (90) 


where ô = p/p — 1 is the density contrast with respect to the background density of the Universe p (Hamaus et al., 
2014b). For voids with an effective radius R, it expresses the average density fluctuation as a function of comoving 
distance r from the void center and contains four parameters: a scale radius rs that determines where the density 
equals the background value, a central under-density ôe, and two power-law indices a and £ that control its inner and 
outer slopes. It has further been shown that the latter two parameters linearly scale with r,, which can be exploited 
to reduce the dimensionality of the parameter space for the density profile to two. While the form of Eq. 90 has 
been motivated and tested by simulation studies (e.g., Sutter et al., 2014b; Barreira et al., 2015; Pollina et al., 2017; 
Falck et al., 2018; Baker et al., 2018; Perico et al., 2019; Stopyra et al., 2021; Shim et al., 2021; Tavasoli, 2021), it 
is also in good agreement with observations (e.g., Sánchez et al., 2017; Chantavat et al., 2017; Pollina et al., 2019; 
Fang et al., 2019). 


However, in redshift surveys the assumption of spherical symmetry is violated due to RSD. They arise as a 
consequence of the peculiar motions of galaxies on top of the Hubble flow, causing a Doppler shift in their emitted 
spectrum. This affects the distance-redshift relation, which only accounts for a Hubble redshift zp. As a result, the 
comoving location x of a galaxy with observed redshift z is given by: 


1+ Zz, 


x(z) = x(Zn) + He) | i 


(91) 
where vj is the component of the galaxy velocity vector v along the line of sight, relative to the observer. The same 
argument applies to the location of a void center X at redshift Z (we use capitals to designate void properties), i.e. 
for the separation vector s between galaxy and void center in redshift space we obtain: 


1+ 2, 


s = x(z) — X(Z) ~ x(z,) — X(Zp) + (Zn) 


1 + Zh 
vi —V)) =r + =~ u] , 92 
(vi = Vj) Ha (92) 
where r is their comoving separation in real space and uj = vj — Vy their relative velocity along the line of sight. 
Thus, a description of the mapping between real and redshift space requires a model for the dynamics of voids. It has 
been shown that the assumptions of average spherical symmetry and local mass conservation at linear order in the 
density contrast provide an accurate relation for the relative velocity field u (Peebles, 1980; Hamaus et al., 2014b): 


f (Zn) Hn) 
2a A 93 
u(r) =- A)r , (93) 
where f is the linear growth rate of density perturbations and A(r) the average density contrast within a radius 
r = |r| from the void center: 


A(r) = =f 6(r’)r? dr’. (94) 


The vector u is directed along the radial direction r from the void center in real space, so the coordinate mapping 
from Eq. 92 can now be written in terms of r and its component along the line of sight rj: 

s=r— FEN) n(n) r- (95) 
From this equation the coordinate transformation between real and redshift space is fully determined by the void 
density profile in real space, e.g. via the fitting formula of Eq. 90. The linear growth rate f only depends on the 


cosmological model, but within the realm of General Relativity it is well approximated by a power law of the matter 
content Nm(z) at redshift z with a growth index of y ~ 0.55, f(z) = Qm(z)7 (Lahav et al., 1991a; Linder, 2005). 


The coordinate mapping in Eq. 95 leads to an anisotropic distortion of voids along the line of sight, as illustrated 
schematically in Fig. 26. Therefore, an isotropic density profile is no longer sufficient to describe the average geometry 
and composition of voids. Instead, the corresponding observable quantity is the void-galaxy cross-correlation function 
€*(s) in redshift space, which not only depends on the magnitude s of the separation vector, but also on the cosine 
of its angle to the line of sight us = s\/s. Because the number of galaxies around every void is conserved in the 
mapping from real to redshift space, the Jacobian Os/Or relates 6(r) to €*(s) via: 


Os 


/ [1 + b4(r)]d°r = J [1 + £°(s)] ae( 2) Br. (96) 
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Figure 26: Schematic representation of a void in real (left) and in redshift space (right). The separation vector r 
between its center at X and a galaxy at x in real space is transformed via s = r + uj to redshift space, where 
uj = vy — Vy is the relative line-of-sight velocity between them. For simplicity the illustration displays u instead 
of arccos(js) to indicate angles to the line of sight and uses velocity displacements in units of (1 + z,)/H(zn). 
Image reproduced with permission from Hamaus et al. (2020), copyright by IOP Publishing. 


Here we have additionally assumed a linear bias relation of the form €(r) = bd(r) between galaxy and matter 
over-densities in real space, with a scale-independent bias parameter b (see Sect. 3.7.4). This assumption has been 
investigated with the help of N-body simulations (Sutter et al., 2014b; Pollina et al., 2017; Contarini et al., 2019; 
Ronconi et al., 2019), but also with galaxy-clustering and weak-lensing observations (Pollina et al., 2019; Fang et al., 
2019), and was found to be remarkably accurate. Using Eq. 95 inside Eq. 96 and an expansion to linear order in 
6(r), one finally arrives at (Cai et al., 2016; Hamaus et al., 2017): 


f 


(8) = dlr) +4 


A(r) + fur [d(r)-A(r)], (97) 
where ur = r/r. Given a density profile 6(r) and the mapping between s and r from Eq. 95, one can now evaluate 
the void-galaxy cross-correlation function ¿° for any observed separation vector s. Since r is unknown, one may 
initially evaluate A(r) at r = s and then calculate r via iteratively applying the following set of equations (Hamaus 


et al., 2020): 
4 
r= teh, TL=sL, "SS h = fac] ; Pa 


where s, is the perpendicular component of s to the line of sight, and hence unaffected by RSD. Equation 97 can also 
be expanded in terms of Legendre polynomials, with monopole and quadrupole as the only non-vanishing multipoles 
at linear order (Cai et al., 2016). 


It remains to determine the real-space density profile ô(r) to be used in the previous equations. Various approaches 
have been followed in the literature: they either make use analytic fitting formulae like Eq. 90 (Paz et al., 2013; 
Hamaus et al., 2015, 2016; Correa et al., 2019), calibrated measurements from simulations (Achitouv et al., 2017; 
Nadathur et al., 2020), or a deprojection technique to determine it from the observed data directly (Pisani et al., 
2014; Hawken et al., 2017; Hamaus et al., 2020). The latter approach is based on the inverse Abel transform (Abel, 
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1842; Bracewell, 1999): 


a ds) 


g(r) = -= (99) 


ds, s? —r? i 
exploiting the fact that the projected void-galaxy cross-correlation function in redshift space, €S (s1) = f €°(s) dsj, 
is insensitive to RSD, which only act along the line-of-sight component sj of the separation vector s (see Fig. 25). 
Assuming linear bias, this provides the real-space density profile via d(r) = €(r)/b. 


Moreover, it is possible to extend this dynamical model to the quasi-linear regime via the so-called Gaussian 
Streaming Model (GSM). Assuming the pairwise line-of-sight velocity uj between void centers and galaxies to follow 
a Gaussian distribution, the void-galaxy cross-correlation function in redshift space is given by (e.g., Paz et al., 2013; 
Hamaus et al., 2015; Cai et al., 2016): 


1+£(s) = I [1 + €(r)) [u = ule ba : (100) 


1 
exp 
V 2m o\ (7, Hr) 2o;(r, Lr) 


which additionally requires the pairwise velocity dispersion along the line of sight a (7, 4r) as a model ingredient. We 
refer to Hamaus et al. (2020) for a discussion on the advantages and disadvantages of the various modeling choices. 


3.7.2 Sample selection 


The observational identification of voids requires a distribution of tracers of the large-scale structure, as it is obtained 
via redshift surveys. Typically these tracers are galaxies with either spectroscopic or photometric redshift estimates, 
but other tracer types, such as galaxy clusters (Pollina et al., 2019), the Ly-a forest (Stark et al., 2015; Krolewski 
et al., 2018; Porqueres et al., 2019), or the 21cm emission from neutral Hydrogen (White and Padmanabhan, 2017; 
Endo et al., 2020) have been considered for void finding as well. These observations commonly optimize the target 
selection based on their individual science cases, but voids can be extracted as a byproduct without additional 
expense. Therefore, the sample selection for voids usually derives from the target tracer selection, and is rarely 
optimized specifically for void detection (however, see van de Weygaert et al., 2011; Pisani et al., 2019, for more 
details on the optimization of surveys for void detection). Nevertheless, previous survey data has proven itself very 
valuable in providing void catalogs of high quality with significant sample sizes (e.g., Sutter et al., 2012b; Mao et al., 
2017b; Fang et al., 2019; Hamaus et al., 2020; Aubert et al., 2020; Nadathur et al., 2020). 


Various techniques for the identification of voids have been presented in the literature (see Colberg et al., 2008; 
Cautun et al., 2018, for an overview of different methods). They either consider a full distribution of tracers in 
3D, or 2D projections along the line-of-sight direction. The former approach is typically applied to spectroscopic, 
the latter to photometric data, although both techniques can be used in each case. Moreover, some void finders 
search for spherical domains with tracer densities below a given threshold, while others locate void boundaries of 
arbitrary geometry in a non-parametric fashion. The latter can be achieved with a so-called watershed algorithm 
(Platen et al., 2007), which requires the definition of a density field from the distribution of tracer particles. The 
density field itself can be estimated in various ways, for example via grid interpolation or adaptive methods, such as 
Delaunay or Voronoi tessellation. As a result, one obtains a nearly space-filling distribution of voids in the large-scale 
structure with individual properties, such as their size, shape, density, or center location, which can be considered 
as cosmological observables. Among the most popular software repositories that implements this is the public Void 
IDentification and Examination toolkit VIDE! (Sutter et al., 2015). It is based on the code ZOBOV (Neyrinck, 2008), 
which performs a Voronoi tessellation and the watershed transform on a set of tracer particles. VIDE additionally 
handles the complexities arising from the survey geometry, which typically represents a masked light cone within a 
given redshift range. Voids intersecting with the boundary of the survey mask are usually excluded from the final 
void catalog. Furthermore, a cut on minimum void size based on the mean tracer separation is often used to mitigate 
the contamination from spurious voids that may arise via random density fluctuations (see Sect. 3.7.4). 


13https://bitbucket .org/cosmicvoids/vide_public/wiki/Home 
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3.7.3 Measurements 


The location of an astronomical object at cosmological distance is determined via its observed redshift z and its 
position on the sky, expressed in angular coordinates J (right ascension) and y (declination). In order to identify 
voids in the 3D distribution of tracers, we first need to perform a transformation to Cartesian coordinates x in 
comoving space: 
cos Ù cos Y 
x(z, 9, p) = (1+ z)Da(z) | sinvcosy | , (101) 
sin Y 


where Da(z) is the angular diameter distance to a tracer at redshift z. It depends on the expansion history of the 
Universe via the Hubble function H(z) and on the curvature of space via the parameter 0, as expressed in Eq. 14. 
That equation can be also written as: 


EOE in -Qg ” Ho z! 
Dalo = raira (Vf a”) cn 


where c is the speed of light and Ho = H(z = 0) the Hubble constant. Thus, in order to perform the coordinate 
transformation in Eq. 101 it is necessary to assume a particular cosmology. Within ACDM, this requires values 
for the radiation, matter, and cosmological constant parameters Qr, Qm, and Qa, which determine the curvature 
parameter as Qk = 1—Q; — Om — Qa. The Hubble function is given by Eq. 8. Once the coordinate transformation 
is performed, voids can be identified in comoving space. It is then possible to ascribe a volume V and an effective 
radius R to every void. In particular, making use of a Voronoi tessellation, one can define these quantities via a sum 
over the cell volumes V; of the individual tracer particles with index i that belong to each void: 


3 1/3 
V= >. Vi, R= (Fv) . (103) 


Moreover, one can define a volume-weighted barycenter from the tracers at location x;, which serves as a good 
estimator for the geometric center of a void (e.g., Sutter et al., 2012b; Cautun et al., 2016; Stopyra et al., 2021): 


X= ae . (104) 


Further properties, such as the inertia tensor with its eigenvalues and eigenvectors, the ellipticity, the minimum 
density, the density contrast, or the average density can be defined for each void based on its defining tracers (Sutter 
et al., 2015). 


The separations between void centers and tracers in comoving space can be calculated via their differences in the 
angle on the sky 66 and in redshift 6z following Eq. 101: 


sı =(14+2)Da(z)60, 8) = qa (105) 


However, as both Da(z) and H(z) depend on the assumed cosmological model, so do the separations. It is therefore 
common practice to introduce two AP parameters q, and qy that inherit the dependence on cosmology via: 


_ 81 _ Date) _ îl _ Al) 
ta. Dalz)? ay) BG) Gas 


where the quantities with an asterisk are evaluated in the true underlying cosmology, which is unknown. In the 
special case where the assumed cosmology coincides with the true one, q1 = q = 1. In turn, measuring q1 and q) 
provides a measurement of D4(z) and H*(z), respectively. However, without an absolute calibration scale the two 
parameters remain degenerate in the AP test. Only their ratio, known as the AP parameter: 


qı _ Da(@)H*(z) 
qy Da(z)H(z) 


E= 


(107) 
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can be determined, which provides a measurement of the product D4 (z)H*(z) (Sutter et al., 2012a; Hamaus et al., 
2016). Furthermore, the observed volume is proportional to s s? , which implies R* = qi! “qi /3 R for the true effective 


void radius (Hamaus et al., 2020; Correa et al., 2021). 
In practice, the AP test is applied to measurements of the void-galaxy cross-correlation function €°(s1,s)) in 
redshift space. It is customary to use the Landy and Szalay (1993) estimator for this purpose: 
By _ (DyDg) — (DyRg) — (RvDg) + (RvRe) 
€*(81, 8) = ; 
(RyRg) 


(108) 


where the angled brackets indicate normalized pair counts of void-center and galaxy positions in the data (Dy, Dz) 
and in random catalogs (Ry, Rg) without spatial correlations. The number of random objects has to be large enough 
to guarantee an unbiased estimate of €°, it is typically set one to two orders of magnitude higher than the number of 
observed objects of each kind. From Eq. 108 it is then straightforward to estimate the projected correlation function 
via line-of-sight integration: 


E (s1) = f (1,51) asi (109) 
Application of the inverse Abel transform from Eq. 99 then provides the real-space correlation function ¿(r) (see 


Fig. 25), which is needed as a model ingredient for €°(s), as in Eqs. 97 or 100. For example, assuming linear bias 
Eq. 97 can be written as: 
1f f 2 


Elsa sp) = Elr) + 5 Sele) + Én [E0 -EO (110) 
where €(r) = a et E(r') r’? dr’. This can be compared to the measured €*(s_, S|) assuming a Gaussian likelihood: 
i 1 a ' 5 os 
LÈIO) œ exp { —5 D> [E (6+) — £ (6:10)] C5" |ê) - eo) è (111) 
a,j 


with model parameter vector © and covariance matrix: 


Cay = (E6) — E] [E6 - Ee] - (112) 


Here, angled brackets imply averages over an ensemble of observations. Because voids are spatially exclusive, they 
represent independent regions of the large-scale structure and the covariance matrix can be estimated via jackknife 
resampling of the observed sample of voids (e.g., Paz et al., 2013; Hamaus et al., 2015; Cai et al., 2016; Correa et al., 
2019). 


The likelihood can be used to determine the AP parameter £, which is equivalent to a measurement of the product 
of angular diameter distance D4 (z) and Hubble expansion H(z) at redshift z. Because these two quantities depend 
on the cosmological model via Eqs. 102 and 8, measurements of ¢ can be converted to constraints on the cosmological 
parameters that enter these two equations. A variation of € corresponds to a change in distance ratios along and 
perpendicular to the line of sight, which can be described as a geometric distortion of void shapes. However, voids 
are additionally affected by RSD due to the peculiar velocity flows in their immediate surroundings, as described 
in Sect. 3.7.1. The magnitude of these velocities and hence the strength of dynamic distortions is controlled by the 
growth rate parameter f, which enters in the model Eq. 110 via Eq. 93. In order to properly model the average 
shapes of voids, respectively the void-galaxy cross-correlation function in redshift space, geometric and dynamic 
distortions must be accounted for at the same time (Hamaus et al., 2015, 2016). Fortunately, the two types of 
distortions influence €°(s_, sy) in fundamentally different ways, such that there is no significant degeneracy between 
the parameters £ and f/b. 


3.7.4 Systematic effects 
The mass distribution on cosmological scales is predominantly constituted by invisible dark matter. Large-scale 


structure surveys merely allow us to infer this distribution via luminous tracers of the mass, but this inference is 
subject to bias, statistical noise, and other sources of error. As we rely on the spatial distribution of tracers for void 
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identification, these complications necessarily propagate into the properties of voids as a source of systematic effects. 
The main known systematics are summarized in the list below. 


Clustering bias. The over-densities of tracers 6; generally differ from the fluctuations in the matter density field 
Ôm, a phenomenon referred to as tracer bias (Desjacques et al., 2018). At linear order in the perturbations this 
difference is quantified by a multiplicative constant b, denoted as linear bias, with 6, = bd, (Kaiser, 1984). For 
example, luminous red galaxies (LRGs) typically have b > 1, because they populate relatively massive halos that 
form in the most over-dense environments (e.g., Gil-Marin et al., 2015; Zhai et al., 2017). Therefore, voids identified 
in the distribution of LRGs exhibit deeper interiors and higher compensation ridges compared to voids identified 
in the dark matter density field (Sutter et al., 2014b; Pollina et al., 2017, 2019). As a consequence, basic void 
properties, such as their effective radius and density profile, depend on the bias of the tracer sample considered for 
their identification. 


Stochasticity. While the distribution of dark matter can be seen as a collisionless fluid, tracers of the mass consist 
of discrete objects, such as galaxies. Therefore, the density field of tracers 6, must be estimated from a finite number 
of objects per volume element, which is subject to discreteness noise, also referred to as shot noise. Typically, this 
shot noise is assumed to obey Poisson statistics, but corrections due to the finite extent of tracers and their nonlinear 
clustering appear (Hamaus et al., 2010; Baldauf et al., 2013; Paech et al., 2017; Ginzburg et al., 2017; Friedrich et al., 
2021). Voids are necessarily affected by shot noise as well, if they are defined via tracer statistics. For example, 
even in a tracer distribution that is drawn from a homogeneous density field, chance fluctuations due to shot noise 
can result in spurious void detections (Neyrinck, 2008). Therefore, one may expect that not all voids identified in a 
real tracer distribution are genuine, but that there is a contamination of spurious voids depending on the sparsity 
of the considered tracer sample. With the help of simulations and mock catalogs the contamination fraction can be 
assessed, exploiting the fact that various void properties distinguish genuine from spurious voids. The use of machine 
learning methods is particularly effective to minimize the contamination by spurious voids (Cousinou et al., 2019). 


Nonlinear RSD. By design, large-scale structure surveys infer distances via the measured redshifts of sources, 
which are distorted due to their peculiar motion along the line of sight (Kaiser, 1984). While just a decade ago 
peculiar velocities have been considered the strongest systematic effect to limit the extractable information from the 
void density profile (e.g., Lavaux and Wandelt, 2012; Sutter et al., 2012a), RSD models for voids have now reached a 
level of maturity that exploit peculiar velocities as an independent source of information. At the linear level, which 
is most relevant for voids, these RSDs can be modeled very accurately, as discussed in Sect. 3.7.1, but their nonlinear 
regime is more complex and difficult to understand from first principles. A well-known example of an extreme type 
of nonlinear RSD is the so-called Finger-of-God (FoG) effect (Jackson, 1972). It arises around the most massive 
structures in the Universe observed in redshift space, galaxy clusters, and appears as an elongated feature along the 
line of sight. This apparent elongation is RSD caused by the virial motion of the cluster member galaxies. While 
the occurrence of FoGs inside voids is less likely, they can disrupt their over-dense boundaries. In turn, this can 
cause spurious mergers or segmentation of voids, preferentially along the line-of-sight direction, which results in an 
anisotropic selection effect (Pisani et al., 2015b; Correa et al., 2022). 


Redshift error. The measurement of redshift is error-prone itself. While this can be largely neglected for the 
high-resolution spectra obtained with spectroscopic redshift surveys, photometric surveys are subject to a relatively 
large photo-z scatter that often amounts to a few percent uncertainty in redshift. Translated to a distance scale, 
this typically corresponds to several tens of Mpc and therefore strongly impacts the identification of voids whose 
extent is of the same order. 2D void finders have specifically been designed to reduce the impact of this error from 
photometric surveys (Sánchez et al., 2017; Kovacs et al., 2019; Vielzeuf et al., 2021). Another option is to rely on 
tracers with higher photo-z accuracy, such as galaxy clusters, for the identification of voids (Pollina et al., 2019). 


Survey boundary. Redshift surveys typically only observe a fraction of the full sky. In addition, objects in the 
foreground, such as stars or the plane of the Galaxy, have to be masked out. Together with the finite redshift range 
of the survey, this creates a complex geometry of the observed cosmological volume. Voids that intersect with a 
survey boundary are only partially observed and hence cannot be used for further analysis. This constraint concerns 
the largest voids most severely, as they are the most likely to extend beyond the edges of the survey. Thus, survey 
boundaries impact the detectable distribution of void sizes in a systematic way, which is not straightforward to 
predict (Sutter et al., 2014c). To mitigate this effect, it is desirable to survey large contiguous fractions of the sky, 
and to discard voids that are too close to the survey boundary. 


75 


0.6217 547° BOSS final 
a all 


0.4 0.6 0.8 0.27 0.35 
1 2 fos m 


=2 =] 


0 
si/R 


Figure 27: Left: Measurement of the void-galaxy cross-correlation function €°(s1, sy) from voids in the final 
BOSS data (color scale with black contours) and the best-fit model (white contours). Right: Constraints on the 
parameters Qm and fog obtained from modeling the data in the left panel. A white cross indicates the best fit and 
dashed lines the mean parameter values obtained by the Planck Collaboration et al. (2020a). Images reproduced 
with permission from Hamaus et al. (2020), copyright by IOP Publishing. 


In reality the mentioned systematic effects do not occur in isolation, but impact the identification of voids 
jointly. It is therefore difficult to address them from purely theoretical grounds. As an alternative, various empirical 
approaches to handle systematics have been adopted in the literature. This can be realized in essentially two different 
ways: first, at the level of the data, such as performing a cleaning procedure to select voids based on their size and 
depth (e.g., Contarini et al., 2019; Ronconi et al., 2019), applying projections within redshift slices (e.g., Sánchez 
et al., 2017), or implementing a velocity reconstruction to control the void selection (Nadathur et al., 2020). Second, 
at the level of the model, which can be extended by additional nuisance parameters to allow some more flexibility 
(e.g., Hamaus et al., 2020; Paillas et al., 2021). Even though such extra parameters may not be uniquely associated 
with a given systematic effect, they can be marginalized over for the cosmological interpretation of the analysis. 


3.7.5 Main results and forecasts 


Voids have been considered for cosmological forecasts and constraints in various ways throughout the literature. 
Constraints from current data based on voids mainly rely on the void density profile, the theoretical modeling of 
the void size function has only recently reached maturity and will show its full power with larger samples of voids 
from the next generation of surveys. Therefore, here we focus on one of the most established applications to probe 
cosmology with voids: the AP test with the void-galaxy cross-correlation function €*(s_, sq). 


Figure 27 shows the results obtained by performing an AP test with voids identified in the final data release of 
the Baryon Oscillation Spectroscopic Survey (BOSS, Dawson et al., 2013). The left panel contains the measured 
Ê (s 1, $|) in bins of void-centric separations along and perpendicular to the line of sight, and the best-fit model 
indicated by white contour lines. Application of a MCMC sampler allows one to retrieve the posterior distribution 
of the model parameters € and f/b. Then, assuming a flat ACDM cosmology, € can be converted to Qm, the only 
free parameter in the product Da H within that model (since Q, can be neglected and Qg = 0). Furthermore, with a 
measurement of the linear clustering amplitude of the tracer galaxies in BOSS, which is determined by the product of 
their bias b and the amplitude of linear matter fluctuations og, the ratio f/b can be converted to the more commonly 
quoted combination fag. Because og is defined in terms of 8h~'Mpc, the posterior on fog should be marginalized 
over the Hubble constant h, which is often neglected (Sanchez, 2020). 
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The constraints on Qm and fog from the AP test with voids in the final BOSS data are shown in the right 
panel of Fig. 27. It demonstrates how competitive this relatively new method is, for example when compared to 
the more traditional approach that focuses on the pairwise clustering of galaxies (e.g., Alam et al., 2017). The 
latter is more challenging to model on small scales, due to the complex velocity statistics of galaxies in over-dense 
environments. However, on large scales it is imprinted with a characteristic scale of about 105h~'Mpc by the 
BAO feature that emerged during the radiation-dominated epoch of the early Universe, which can be used as a 
standard ruler to constrain Da(z) and H(z) individually. There are strong indications that the combination of such 
measurements with the AP test from voids can greatly improve the precision on cosmological parameters, thanks to 
their complementarity (e.g., Nadathur et al., 2020; Paillas et al., 2021; Kreisch et al., 2021). 


Figure 28 summarizes the cosmological constraints that have been obtained from cosmic voids as a stand-alone 
probe in the literature. Despite being a young field of research, it has been blossoming with applications of increasingly 
accurate techniques applied to very different surveys, including SDSS (Sutter et al., 2012a), BOSS (Sutter et al., 
2014d; Hamaus et al., 2016; Mao et al., 2017a), eBOSS (Hawken et al., 2020; Aubert et al., 2020; Nadathur et al., 
2020), VIPERS (Hawken et al., 2017), and 6dFGS (Achitouv et al., 2017). All measurements of Da H are based on 
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Figure 28: Comparison of constraints on growth via fog and geometry via Da H (68% confidence intervals) 
obtained from cosmic voids in the literature, references are ordered chronologically in the figure legend. Gray 
lines with shaded error bands show the Planck Collaboration et al. (2020a) baseline result as a reference, with 
corresponding values of DH’. Filled markers indicate growth rate measurements without consideration of the 
AP effect, while open markers include the AP test. The different line styles of error bars indicate various degrees 
of model assumptions made: model-independent (solid), calibrated on simulations (dashed), calibrated on mocks 
(dotted), calibrated on simulations and mocks (dash-dotted). 
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the AP test, while some constraints on fog are only derived from dynamic distortions of void shapes and assume 
a fiducial cosmology with a fixed DaH. The method becomes particularly powerful towards higher redshift, where 
the observed volume and hence the available sample size of voids grows larger. Moreover, the product Da(z)H(z) is 
an increasing function of redshift, so its measurement becomes more sensitive to changes in cosmological parameters 
at higher z. These two trends will eventually be overcome by the declining amplitude of nonlinear fluctuations 
in the matter density field and the absence of observable tracers of the latter. However, upcoming surveys of the 
next generation, such as DESI (DESI Collaboration et al., 2016), Euclid (Laureijs et al., 2011), PFS (Takada et al., 
2014), the Nancy Grace Roman Space Telescope (Spergel et al., 2015), the Vera Rubin Observatory (LSST Science 
Collaboration et al., 2009), and SPHEREx (Doré et al., 2014) are expected to obtain void catalogs of unprecedented 
size, containing on the order of 10° objects each (Pisani et al., 2019; Hamaus et al., 2021). Compared to the current 
state of the art, this corresponds to an increase of about two orders of magnitude. Therefore, we expect the next 
generation of surveys to initiate an era of voids in the pursuit of precision cosmology. 
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3.8 Neutral Hydrogen Intensity Mapping 


Traditionally, large-scale structure surveys aim to detect individual galaxies in three dimensions. This involves 
measuring the redshift of each galaxy as well as its angular position on the sky, and then creating a catalog and a 
corresponding 3D map. This procedure has been routinely used by optical galaxy surveys like SDSS and has led to 
constraints on dark energy, gravity, and the initial conditions of the Universe (see, for example, Beutler et al., 2012; 
Alam et al., 2021; Mueller et al., 2021). An alternative proposal is to map the large-scale structure of the Universe 
using the redshifted 21cm line from the spin flip transition in neutral hydrogen (H1) with radio telescopes (Battye 
et al., 2004; Chang et al., 2008; Loeb and Wyithe, 2008; Mao et al., 2008; Peterson et al., 2009; Seo et al., 2010; 
Ansari et al., 2012). 


3.8.1 Basic idea and equations 


The Hi Intensity Mapping technique does not require the often difficult and expensive detection of individual galaxies. 
Instead, it maps the entire HI flux coming from many galaxies together in large 3D pixels, across the sky and along 
time (see Fig. 29). With radio telescope arrays, the HI intensity mapping method has the potential to provide the 
largest map of the Universe back to ~1 billion years after the Big Bang. The data can then be used for precision 
cosmology and galaxy evolution studies (Kovetz et al., 2020; Ahmed et al., 2019). 


A number of HI intensity mapping experiments are expected to launch in the next few years, with some of them 
already working with pathfinder data. These are the proposed MeerKLASS survey using the SKA Observatory’s 
MeerKAT precursor (Santos et al., 2017), FAST (Hu et al., 2020), BINGO (Battye et al., 2013; Wuensche, 2019), 
CHIME (Bandura et al., 2014), HIRAX (Newburgh et al., 2016), Tianlai (Li et al., 2020; Wu et al., 2021), PUMA 
(Slosar et al., 2019), and CHORD (Vanderlinde et al., 2019). Existing experiments include HI intensity mapping 
surveys performed with the Green Bank Telescope (GBT) (Chang et al., 2010; Switzer et al., 2013, 2015; Masui 
et al., 2013; Wolz et al., 2022) and Parkes (Anderson et al., 2018). 


At cosmological distances, the 21cm line is redshifted to very low frequencies, which alleviates the danger of 
line confusion that often plagues other lines. There is a one-to-one correspondence of observing frequency, v, with 
redshift, z, given by: 

1420.4 
yv = 


l+z 


MHz . (113) 


Galaxies Intensity map 


Figure 29: From individual galaxies (left) to Hi intensity maps (right). Using the intensity mapping technique 
we can map the entire Hı flux from many galaxies together in large 3D pixels, and produce low angular resolution 
Hi brightness temperature maps that retain the large-scale statistical information. This figure was produced using 
the MultiDark simulations (Klypin et al., 2016; Knebe et al., 2018) and the methods in Cunnington et al. (2020b). 
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Figure 30: Simulated full sky maps of different 21cm foreground components at a frequency of 1136 MHz 
(z = 0.25). The frequency dependence of these foregrounds can be approximated by power laws with a running 
spectral index. Image reproduced with permission from Cunnington et al. (2019), copyright by MNRAS. 


For this reason, there is no need of detecting and cataloging individual galaxies. Looking at some 3D region (voxel) 
on the sky, the radio telescope receives the total 21cm intensity from that region, as demonstrated in Fig. 29. This 
is a proxy for the total amount of hydrogen in the voxel, which is then assumed to be a (biased) tracer for the total 
matter density. While the telescope beam can be quite large and erase the small-scale structure, the large-scale 
statistical information is retained. From Earth, the 21cm line is measurable up to very high redshifts, z ~ 50, and 
could reach z ~ 200 with a lunar instrument (Furlanetto et al., 2006). This provides a unique opportunity for 
cosmology and astrophysics studies at high redshifts, where traditional galaxy surveys become shot noise limited. 


It is important to consider the mode of operation of the telescope array. Purpose-built HI intensity mapping 
experiments like CHIME and HIRAX are interferometers with elements that are closely packed together. Sparse 
arrays like MeerKAT and SKA-MID cannot provide enough short baselines to probe cosmological scales when used 
in interferometric mode. Instead, they need to operate in “single-dish” mode (Battye et al., 2013; Wang et al., 
2021), where the array is used as a collection of scanning auto-correlation dishes. This is necessary in order to map 
cosmological scales with sufficient sensitivity (Bull et al., 2015; Santos et al., 2015; SKA Cosmology SWG, 2020). 


A major challenge for the HI intensity mapping method is the presence of strong astrophysical emission, or 
foreground contamination: 21cm foregrounds such as galactic synchrotron (Zheng et al., 2017), point sources, and 
free-free emission, are bright in the relevant frequency ranges and can be orders of magnitude stronger than the 
cosmological HI signal (see Fig. 30 and top panel of Fig. 31). Hence, they have to be removed. 


3.8.1.1 Modeling the observed HI signal 


We consider the 3D power spectrum as our main observable, and follow the formalism used in optical galaxy surveys 
analyses. Similarly to optical galaxies, redshift space distortions (RSD) introduce anisotropies in the observed Hı 
power spectrum. In order to account for this, we consider the power spectrum as a function of redshift z, k, and 
bt, where k is the amplitude of the wave vector and u the cosine of the angle between the wave vector and the 
line-of-sight (LoS) component. We model RSD by considering the Kaiser effect (Kaiser, 1987), which is a large-scale 
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effect dependent on the growth rate, f. To linear order, the anisotropic HI power spectrum can be written as: 
= = 2 
Pu(k, p) = (Timb: + Tufu’) Pu(k) gags ae (114) 


where Psn = Thi /T) is the shot noise, 7 is the number density of objects, Py(k) is the underlying matter power 
spectrum, by; is the HI bias, and Ty; is the mean HI brightness temperature (Chang et al., 2010; Battye et al., 2013): 


_ Ounr(z)h ) (+z)? 


Tm = 44 115 
Ai 2.45 x 10-74) H(z)/Ho au“ 


The Psn contribution is expected to be subdominant (smaller than the thermal noise of the telescope) and is usually 
neglected (Villaescusa-Navarro et al., 2018). The Hi abundance and clustering properties have been studied using 
simulations and semi-analytical modeling (see, e.g., Villaescusa-Navarro et al., 2018; Spinelli et al., 2020). In general, 
the clustering of HI can be accurately described by perturbative methods (Castorina and White, 2019), and maps 
can be constructed with N-body and approximate methods based on the halo model (Alonso et al., 2014; Villaescusa- 
Navarro et al., 2014; Padmanabhan et al., 2016, 2017; Carucci et al., 2017; Spinelli et al., 2022; Avila et al., 2022). For 
interpreting current and forthcoming observations, there is a pressing need to work towards end-to-end simulations 
including observational effects (Spinelli et al., 2022). 


3.8.1.2 The telescope beam effect 


The telescope beam introduces one of the main instrumental effects in the case of single-dish intensity mapping 
experiments. We can model this effect using a damping term dependent on the physical smoothing scale of the 
beam (see, e.g., Battye et al., 2013; Villaescusa-Navarro et al., 2017). Assuming the telescope beam can be modeled 
as a Gaussian, this is defined as Rpeam = oor(z), where o9 = Opwum/(2./21In(2)), Oewum ~ A/Daisn is the full- 
width-half-maximum of the beam with diameter Dagish at observation wavelength A = 21(1 + z) cm, and r(z) is the 
comoving distance to a redshift z. We emphasize that the angular resolution of single-dish surveys is very low, of 
the order ~1 deg, while interferometers have much better angular resolution. 


The Fourier transform of the telescope beam damping term is: 


~ —k? R2 1-2 
Ba (sp) = exp ( Fi) (116) 
and the power spectrum becomes: 
a = = 2 
Pan (k, u) = B2 (k, p) x | Taba +Tufu2)” Pulk) + Psx] l (117) 


For surveys that are limited in frequency resolution, a similar effect will occur on the small radial scales. In cases 
where this might be relevant, a way to account for it is described in Blake (2019). 


3.8.1.3. Thermal noise 


Instrumental noise is determined by the telescope configuration and survey strategy (see, e.g., Battye et al., 2013; Bull 
et al., 2015; Pourtsidou et al., 2017, for detailed descriptions of representative cases). For a single-dish experiment, 
the pixel noise is assumed to be described by a Gaussian random field with spread given by: 


Tsys(V) l 
V Ôrttotal (Qpix/ Sarea ) Naishes 


(118) 


Opix = 


Here, T;,s(v) is the system temperature (including receiver and sky components) at a given frequency, Sarea the sky 
area, Qpix = 1.3302wyy, Naishes the number of dishes (this can also include multiple feeds/beams per dish), 6, the 
frequency channel bandwidth, and tops the total observation time. The combination tops(Qpix/Sarea) represents the 
time spent at each pointing. It follows that the noise power spectrum is: 

Py = a Vpix (119) 


pix 
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where Vpix is the voxel volume. Typical values for a cosmological survey using MeerKAT in single-dish mode are 
Tsys ~ 30K, Naishes = 64, Sarea = 5, 000 deg? and ttotal = 5, 000 hrs. 


For the simplest form of interferometer, a dual polarization array assuming uniform antennae distribution, the 


noise power spectrum is: 
PN T2 2 `$ 1 Sarea (120) 
= T Yp F 
sys” Ww \ A2) On(u)trota (FOV 


Here, Ae is the effective beam area, FOV œ~ A/(Daisn)?, r is the comoving distance to the observation redshift 
z, and y, = c(1 + 2)?/(voH(z)) with vo = 1420 MHz. The distribution function of the antennae n(u) can be 
approximated as n(u) ~ N? /2Tu2ax for the uniform case, where Ny is the number of elements of the interferometer 


max 


and Umax ~ Dmax/A with Dmax the maximum baseline. Typical values for a compact instrument like HIRAX are 
Tsys = 50K, Np = 1024, Daish = 6m, Dmax = 250m, Sarea = 15, 000 deg? and ttotal = 10,000 hrs. 


3.8.2 Sample selection 


HI intensity mapping maps the entire HI flux coming from many galaxies together in large voxels. This means that 
we do not need to select individual galaxies. An advantage of the intensity mapping technique is that it is sensitive to 
all sources of HI emission, regardless how faint. This is in contrast to traditional galaxy surveys, which are sensitive 
only above a flux cutoff. This makes HI intensity mapping ideally suited to probe the global HI content, a key 
quantity for galaxy formation and evolution studies. 


Another fundamental choice is the bandwidth of observation. For example, MeerKAT can perform cosmological 
observations using its L-band (900 — 1420 MHz) or UHF-band (580 — 1000 MHz) receivers. The former corresponds 
to a redshift range 0 < z < 0.58, while the latter can probe 0.4 < z < 1.45. Band 1 of SKA-MID corresponds to a 
very wide redshift range, 0.35 < z < 3. Other examples are CHIME and HIRAX, with 0.8 < z < 2.5. Depending on 
the bandwidth of observation as well as the sky area coverage and total observing time, these HI intensity mapping 
surveys can measure Baryon Acoustic Oscillations and Redshift Space Distortions, and search for signatures of 
primordial non-Gaussianity. 


The selection of frequency bandwidth and patch of sky can also be tuned to try and mitigate known systematic 
effects: 


e Human-made Radio Frequency Interference (RFI) is a major source of contamination (Harper and Dickinson, 
2018). While methods for RFI flagging and removal do exist (Offringa et al., 2010; Akeret et al., 2017), it is 
important to perform observations in “radio-quiet” locations. 


e Foreground contamination from Galactic synchrotron, free-free emission, and point sources, can be orders of 
magnitude larger than the HI cosmological signal (see, e.g., Oh and Mack, 2003). Different regions of the sky 
are contaminated by the various foregrounds differently, and regions of the sky that are particularly complex 
(for example the Galactic plane) should be avoided. 


e At the time of writing, there has been no detection of the HI auto-correlation signal due to residual foregrounds 
and other systematics (see, e.g., Switzer et al., 2013, 2015). The only available detections come from cross- 
correlating HI maps with spectroscopic optical galaxies (Chang et al., 2010; Masui et al., 2013; Anderson et al., 
2018; Wolz et al., 2022). While detecting the HI auto-correlation is the primary aim, it is currently desirable 
that the chosen patch of sky overlaps with optical galaxy surveys. 


3.8.3 Measurements 


In this section, we summarize the main steps of a typical HI intensity mapping data analysis procedure, based on 
the pioneering works by the GBT, Parkes, and MeerKLASS teams (Chang et al., 2010; Switzer et al., 2013, 2015; 
Masui et al., 2013; Anderson et al., 2018; Wang et al., 2021). In general, single-dish observations require a scanning 
strategy where the dishes are rapidly moving across the sky. The goal is to keep the instrument gains (which are 
set by the so-called 1/f noise (Harper et al., 2018)) constant, limited only by the thermal noise fluctuations while 
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covering the relevant angular scales. The scanning strategy is also tuned in order to avoid false signals, for example 
ground spill and atmospheric emission. 


From raw data to maps. The raw data is stored in time-stream blocks. The first step of the data analysis is to 
mitigate RFI contamination. This is facilitated by the high spectral resolution of the data (e.g. 4096 channels across 
200 MHz of bandwidth for the GBT). Individual frequency channels are flagged and removed based on their variance. 
Any RFI in a block whose variance is not prominent enough to be flagged is identified as increased noise later on and 
down-weighted at the map-making stage. Some low-level RFI can be masked after map-making. In addition, aliasing 
issues and high variance often result in removing channels within a few MHz of the band edges, as well as channels 
in the receiver’s resonances. Before mapping, the data are re-binned (to ~1 MHz bins). The time-stream data can 
be converted to sky maps with an inverse-noise weighted chi-squared minimization. This is a known method from 
CMB map-making (Tegmark, 1997), and it produces the maximum likelihood (unbiased and optimal) estimate of 
the sky map assuming the noise is Gaussian. The algorithm also produces an inverse noise covariance matrix, useful 
for applying inverse-noise weights. 


Foreground subtraction. Strong astrophysical foregrounds have to be separated from the cosmological HI signal. 
Fortunately, these are expected to be spectrally smooth, following power-laws in frequency (Oh and Mack, 2003; 
Santos et al., 2005; Seo et al., 2010), and can be removed if the calibration of the instrument is well controlled. 
The top panel of Fig. 31 demonstrates the differences in amplitude of the various foregrounds compared to the 
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Figure 31: Top: Angular power spectra for different simulated foregrounds, and the Hı cosmological signal. The 
black solid line represents the combined signal. All are at a frequency of 1136 MHz (z = 0.25). Bottom: Observed 
brightness temperatures along a chosen LoS through frequency (redshift). Images reproduced with permission 
from Cunnington et al. (2019), copyright by MNRAS. 
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Figure 32: Simulated Hı maps before and after foreground cleaning with PCA. From left to right: a map 
with simulated Hı signal with added thermal (instrumental) noise; the same map with added 21cm foregrounds; 
the “cleaned” map after performing foreground removal with PCA. This figure was produced using the publicly 
available code gpr4im (Soares et al., 2021) and MeerKAT-like simulated data products at a frequency of 1136 
MHz (z = 0.25). 
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cosmological HI signal, while the bottom panel demonstrates the differences in spectral smoothness. 


Since cosmic HI oscillates in a near-Gaussian fashion with frequency, in contrast to the slowly evolving foregrounds 
that are also orders of magnitude larger, the two can be separated (Liu and Tegmark, 2011; Wolz et al., 2014; Shaw 
et al., 2015; Alonso et al., 2015b). Blind component separation methods aim to identify a set of smooth functions 
(the dominant foreground components) and subtract them from the observed maps to uncover the cosmological Hı 
signal. There is a wealth of different foreground removal algorithms, including parameterized fitting, non-parametric 
fitting, and mode projection (see Liu and Shaw, 2020, for a comprehensive review). Principal Component Analysis 
(PCA) is a popular method that uses mode projection and exploits the fact that foregrounds are much larger in 
amplitude than the signal. PCA works by estimating the data frequency-frequency covariance!* matrix and then 
performing an eigenvalue decomposition. The strongest modes in frequency (the foregrounds) can then be identified 
and projected out. An advantage of this “blind” approach is that it can take into account a distortion of the 
smoothness of the foregrounds by the instrument, as it works by determining which modes are dominant in the 
observed data. However, the price to pay is that inevitably a part of the cosmological HI signal will also be removed 
(Switzer et al., 2015). Other methods include Independent Component Analysis (ICA) (Chapman et al., 2012; Wolz 
et al., 2017b) and Generalized Morphological Component Analysis (GMCA) (Chapman et al., 2013; Carucci et al., 
2020). The former works by maximizing non-Gaussianity, and the latter is a sparsity-based algorithm that works 
with the spatial structure of the foregrounds in wavelet space. An example of a non-parametric fitting method is 
Gaussian Process Regression (Mertens et al., 2018; Soares et al., 2021). Other methods include the Generalized 
Needlet Internal Linear Combination (GNILC) (Olivari et al., 2018) and Kernel PCA (Irfan and Bull, 2021). For 
recent comparisons of different foreground removal methods using real data and simulations, see Hothi et al. (2020); 
Cunnington et al. (2021a); Spinelli et al. (2022). 


In Fig. 32 we have taken simulated HI intensity maps, and added thermal (instrumental) noise and 21cm fore- 
grounds. We have then performed a PCA foreground cleaning. An important choice made by hand is how many 
principal components, Npcg, to remove. In this case, we show results with Npa = 3, which is expected to be near 
optimal for idealized simulated cases like the ones we have considered here (Alonso et al., 2015b; Wolz et al., 2017b). 
However, a much higher Nrg ~ 30 has been required for real data analyses (Masui et al., 2013; Wolz et al., 2022). 


3.8.3.1 Power Spectrum estimator 


When performing an HI intensity mapping survey, it is useful to separately analyze sub-datasets taken at different 
times (seasons) so that the thermal noise of the instrument is independent in each map. This way the HI power 
spectrum can be constructed by cross-correlating (and then averaging over) different sub-datasets; this procedure 


M4RFI can also be detected as frequency-frequency covariance in the foreground cleaning (Switzer et al., 2015). 
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Figure 33: Measured Hı power spectra demonstrating the Hi signal loss effect after foreground cleaning with 
PCA (Nre = 3). This figure was produced using the publicly available code IntensityTools (Cunnington et al., 
2019; Blake, 2019; Soares et al., 2021) and MeerKAT-like simulated data products at 0.2 < z < 0.58. 


has the advantage that the final power spectrum is free of the additive thermal noise bias (Switzer et al., 2013; Masui 
et al., 2013). The method can also suppress systematics like time-dependent RFI. 


Intensity maps are over-temperatures measured as a discrete function of position, 6(#;) = T(#;) — T, where T is 
the mean temperature at each frequency slice. The total number of pixels, Npix = Nz - Ny - Nz, is defined by the 
angular grid and the number of frequency bins. It follows that the Fourier transform of the temperature field is a 
function of wavevector ke. We can write: 
Nopix 

d(ke) = XC 5(#;)w(a;)exp(ike « Z) , (121) 
j=l 

where w(Z,;) is a weighting function normalized to unity. 

Let us now introduce the inverse-noise weighted power spectrum estimator in the flat-sky approximation as used 
in the GBT and Parkes analyses!* (Masui et al., 2013; Wolz et al., 2017b; Anderson et al., 2018; Wolz et al., 2022). 
For the cross-correlation of two sub-dataset maps A and B, we have: 


VeouRe{ 64 (ki) -ôP (ki)*} (122) 


with Veen = Vs/Npix, where V; = Lg - Ly - Lz is the comoving physical volume of the data cube. For HI intensity 
maps, w(Z,;) is given by the inverse noise map of each season. The estimator can be straightforwardly recast for the 
cross-correlation of intensity maps with optical galaxies, denoted with subscript “opt”. The total weighting factor is 
then w(Z;) = W(Z;)wopt(Zj), where wopt(Z;) is given by optimal weighting function wopt (Zi) = 1/(1+W (i) x N Po), 
with Py = 10°h~*Mpc’, and the selection function W(Z;) (Feldman et al., 1994). The 1D power spectra, P(k), are 
determined by averaging all modes with k = Ik] within the k bin width. This is the well known power spectrum 
monopole. We can also compute higher order multipoles like the quadrupole and hexadecapole following the multipole 
expansion formalism (Blake, 2019; Cunnington et al., 2020b). 


In Fig. 33 we show the measured power spectra from simulations with an input (fiducial) HI signal, to which 
foregrounds are added and then removed using PCA. We can immediately see that the process of foreground cleaning 
results in large-scale HI signal loss. Accounting for this effect is crucial in order to get unbiased HI and cosmological 
constraints (Masui et al., 2013; Bernal et al., 2019; Cunnington et al., 2020a; Soares et al., 2021). More details on 
how this can be done will be presented in Sect. 3.8.5, where we will also describe ways to estimate the statistical 
uncertainties in the measurements. 


15For a comprehensive description of all relevant observational effects and the derivation of general weighting schemes, as well as the 
publicly available pipeline, see Blake (2019). 
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3.8.4 Systematic effects 


The main known sources of systematic uncertainties affecting the HI measurements are: foreground contamination, 
1/f noise, Radio Frequency Interference, calibration errors, and primary beam effects. We summarize these in the 
list below. 


Foregrounds and polarization leakage. We have already discussed that spectrally smooth foregrounds can be 
many orders of magnitude larger than the HI cosmological signal (see Figure 31), and how their removal with methods 
like PCA or FastICA results in large-scale signal loss (Figure 33). In addition, the interplay between polarized 
foregrounds and the instrument leads to polarization leakage, a non-smooth component that further complicates 
the cosmological analysis. For detailed studies on this subject in the context of HI intensity mapping, see Shaw 
et al. (2015); Alonso et al. (2014, 2015b); Carucci et al. (2020); Cunnington et al. (2021a). The auto-correlation of 
intensity maps is biased by residual foregrounds. However, these residuals and other survey-specific systematics are 
expected to drop out in cross-correlation with optical galaxy surveys, and that is why detections to date have only 
been achieved with cross-correlations (Masui et al., 2013; Anderson et al., 2018; Wolz et al., 2022). 


1/f noise. This is a form of time-correlated noise component that manifests itself as gain fluctuations and leads 
to stripes in the HI intensity maps. This noise can be mitigated with a fast enough scanning strategy, reduced by 
applying conservative PCA cleaning in the time-ordered data, and/or calibrated out. For detailed studies of this 
subject in the context of HI intensity mapping, see Bigot-Sazy et al. (2015); Harper et al. (2018); Li et al. (2021). 


RFI. We have already mentioned Radio Frequency Interference (RFI), which can originate from terrestrial telecom- 
munications as well as navigation satellites (Harper and Dickinson, 2018), and is a major problem for all radio 
observations. Even if the experiment employs RFI mitigation systems, it has been shown that RFI can still domi- 
nate thermal noise in several channels within the band (Switzer et al., 2013; Masui et al., 2013; Wang et al., 2021), 
resulting in significant signal loss (~11% for the GBT). In Sect. 3.8.3 we discussed how RFI flagging and removal 
is performed, although this is likely not optimal considering the requirements of forthcoming HI intensity mapping 
experiments. For example, missing frequency channels as a result of RFI flagging can compromise the performance 
of foreground removal methods (Carucci et al., 2020; Soares et al., 2021). 


Calibration. Bandpass and flux calibration errors can have a large impact on the HI signal recovery. A successful 
calibration process must calibrate the receiver gain fluctuations, account for the bandpass spectrum that multiplies 
the true sky signal, and calibrate the total power. The main calibration procedures are using periodic noise diodes 
as relative calibration references and tracking known astronomical sources, each of which has its own limitations and 
uncertainties (Masui, 2013; Newburgh et al., 2014; Anderson et al., 2018; Wang et al., 2021). For the GBT observa- 
tions used in Masui et al. (2013), uncertainties on the calibration of the reference flux scale and the measurements 
of calibration sources with respect to this reference, uncertainty of the measured fluxes, receiver non-linearity, beam 
shape irregularities and other variations led to a 9% total calibration systematic error. This translates to systematic 
errors in the derived HI constraints below the statistical errors for the GBT levels of thermal noise, but for future 
experiments aiming to perform high precision cosmological measurements calibration levels must be improved. 


Primary beam effects. In the vast majority of HI intensity mapping literature, the telescope beam effect is 
approximated by a perfect Gaussian smoothing, like in Eq. 116. But in reality, there are side-lobes in the beam 
profile; the primary beam can also distort the frequency structure of the foregrounds due its own dependence on 
frequency. A way to mitigate this and other issues related to the instrumental response is to convolve all maps to 
a common resolution, higher than the one of the largest beam in the frequency band (see, e.g., Switzer et al., 2015; 
Wolz et al., 2017b). However the way this convolution is done is based on the Gaussian beam model. Side-lobes 
further complicate foreground removal, and end-to-end simulations will be necessary in order to address this challenge 
(Matshawule et al., 2021; Spinelli et al., 2022). 


3.8.5 Main results and forecasts 


Here we summarize the main data analysis results and forecasts. We begin by describing how the GBT measurements 
and power spectrum analyses have been performed. Then we present current constraints as well as forecasts on HI 
parameters. We end this section by listing some of the cosmological forecasts that have been performed to demonstrate 
the ability of HI intensity mapping surveys to constrain dark energy, gravity, and the initial conditions of the Universe. 
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3.8.5.1 Data analyses 


The most comprehensive HI intensity mapping analyses to date have been performed using GBT observations, alone 
and in combination with optical galaxy surveys (Chang et al., 2010; Switzer et al., 2013, 2015; Masui et al., 2013; 
Wolz et al., 2017b; Wolz et al., 2022). In this section, we will describe how these analyses were performed, and 
present the detections and constraints they achieved. For a comprehensive description of the GBT pipeline and 
analysis software'® we refer the reader to Masui (2013). 


The GBT data we work with cover 100 deg” on the sky and a redshift range 0.6 < z < 1. The data is contaminated 
by RFI and two telescope resonance frequencies. To suppress these effects, several frequency (redshift) channels were 
removed. The GBT also uses 4 Sections (sub-seasons) {A, B,C, D} to suppress thermal noise bias as described in 
Sect. 3.8.3. The noise is large and highly anisotropic towards the edges of the map, therefore 15 pixels per side are 
masked from the analysis. Foreground removal on the GBT data has been performed using PCA (Switzer et al., 
2013; Masui et al., 2013) and FastICA (Wolz et al., 2017b; Wolz et al., 2022). The GBT beam can be approximated 
by a Gaussian with a frequency-dependent FWHM, and in order to mitigate some systematic effects as explained in 
section 3.8.3, the maps are convolved to a common resolution of 0.44 deg. 


Here, we concentrate on the most recent analysis presented in Wolz et al. (2022). In the left panel of Fig. 34 we 
show the (masked) Sections A and D after using Nec = 36 in the FastICA foreground removal process. In principle, 
the cross-correlation of Sections (e.g. A x B, A x D, etc.) should be a proxy for the HI auto power spectrum. For 
this, the GBT analysis using the estimator of Eq. 122. A correction is applied to the power spectrum estimate for 
the telescope beam effect using the discretized, Fourier-transform Gaussian beam of Eq. 116. 


The GBT data are noise dominated. Therefore, for the cross-correlation of the different Sections we can estimate 


the measurement errors as: : 
a(P^P (k;)) = Paoise(ki)/ V 2N (ki), (123) 


with N the number of independent measured modes in the k bin, and the factor 1/ V2 accounts for the fact that the 
two maps are independent. There are various approaches for estimating Pyoise, such as using the power spectrum of 
each sub-dataset after the foreground removal as a proxy for the noise (for more details, see, e.g., Wolz et al., 2017b). 


Despite the thermal noise bias mitigation, the HI auto power spectrum result is an order of magnitude higher 
than what is expected from theory (Switzer et al., 2013). This is because the data suffers from systematic effects 
and we have to resort to cross-correlations with optical galaxies to mitigate them and achieve a detection. 


The first detection of the cross-correlation between LSS and HI intensity maps at z ~ 1 was reported in (Chang 
et al., 2010), using data from the GBT and the DEEP2 galaxy survey. A more significant detection using GBT 
intensity maps and overlapping WiggleZ galaxies was achieved in Masui et al. (2013), and again in Wolz et al. (2022) 
at the level of ~ 50. The latter study also achieved ~ 50 detections using the LRG and ELG samples from SDSS- 
eBOSS. In the right panel of Fig. 34 we show the measured GBT-WiggleZ cross-correlation power spectrum with 
Nee = 36 used in FastICA for the foreground cleaning of the GBT Hı maps. We also show a null diagnostic test 
plotting the ratio of data and error. The error in the galaxy-HI cross-correlation is estimated as: 


o (Pa (id) = y| sappy V Pem (o? + Pe (ki) PAP (Bi) (124) 


An important component of the GBT data analysis is the use of the transfer function formalism to quantify and 
correct for the Hr signal loss due to foreground removal (see Fig. 33). We will give a brief description of how this 
works here, and we refer the interested reader to Switzer et al. (2013); Wolz et al. (2022) for the details. Suppose 
we have a set of mock (simulated) HI signal data, denoted by m, and our real (observed) data, d. Let us also 
denote cross-correlation of our real and mock data cubes by “,”. If our real data was completely free of foreground 
contamination, we would simply have P(d + m,m) = P(m). This means that if we inject the mock into the data 
and then cross-correlate with the mock we would get back the power spectrum of the mock, i.e., P(d,m) = 0 (the 
data corresponds to a different realization). But foreground effects will distort this picture and we wish to introduce 
a transfer function, T(k), to compensate for the unavoidable signal loss. We will then have: 


P(FG(d+m),m) = P(m)-T , (125) 


16nttps: //github.com/kiyo-masui/analysis_IM 
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Figure 34: Left: GBT data Sections A and D used in the analysis of Masui et al. (2013); Wolz et al. (2022) 
with the masking choices detailed in the main text. The different Sections correspond to different data seasons 
and help mitigate thermal noise bias and other systematic effects. Right: GBT-WiggleZ cross-correlation power 
spectrum (top) and a null diagnostic test (bottom) using data from the analysis of Wolz et al. (2022). 


where FG(d + m) corresponds to foreground cleaning of the (d + m) combined data cube, which takes into account 
the real data effects and systematics. The above formula defines the transfer function (in reality this is constructed 
in 2D, (k,k1)). The signal loss can reach very high levels (~50%) depending on scale (Wolz et al., 2022), and 
therefore the transfer function correction is necessary in order to recover the true HI power spectrum. Assuming the 
chosen fiducial cosmology (which is kept fixed in the GBT analyses) is correct, we can then proceed to perform a 
best-fit analysis for constraining HI quantities. 


3.8.5.2 HI measurements and forecasts 


In the post-reionization era, HI intensity mapping provides an excellent probe of galaxy evolution. We will first 
review the main findings of the GBT (Masui et al., 2013; Wolz et al., 2022) and Parkes (Anderson et al., 2018) 
cross-correlation analyses. The model for P, m is given by: 


Ps m(k) = Trbinbertnopt Ps (k) , (126) 


with by, the Hi bias, bg the optical sample bias (WiggleZ, eBOSS ELGs, eBOSS LRGs), ruopt the galaxy-hydrogen 
correlation coefficient, and Pss(k) the nonlinear matter power spectrum including a linear RSD boost (for more 
details and a discussion on the assumptions and limitations of this empirical model, see Wolz et al., 2022). The 
coefficient Ty,,opt is dependent on the HI content of the galaxy sample. The model is run through the same pipeline 
as the data to include weighting, beam, and window function effects. With the cosmology and optical bias values 
kept fixed, and using Equation 115, we can fit the unknown pre-factor Qy bury opt to the data. 


Following this procedure, Masui et al. (2013) measured the GBT maps cross-correlation with the WiggleZ 15hr and 
lhr fields. Fitting data in the range of scales 0.05 hMpc™! < k < 0.8hMpc~’, they found 10?Qy,by,r = 0.40 + 0.05 
for the combined, 10?Qy,by,r = 0.46 + 0.08 for the 15hr field and 10°Qy,by,r = 0.34 + 0.07 for the 1hr field. For a 
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Figure 35: Left: Estimates for Qu, from Wolz et al. (2022) compared to other measurements in the literature 
(see Crighton et al. (2015) and references therein). All estimates are at a central redshift z = 0.78 but they have 
been staggered for clarity. Right: Forecasts for Qu. This figure was produced using the publicly available code 
IM-Fish (Pourtsidou et al., 2017) and SKA-like specifications (SKA Cosmology SWG, 2020). 


more restrictive range of scales, their combined measurement was 10°Qy,by,r = 0.44 + 0.07. The errors quoted are 
statistical, and Masui et al. (2013) also estimated a +0.04 systematic error. 


With similar methodology and considering three different ranges of scales, Wolz et al. (2022) found Qu bmrm,wig = 
[0.58 + 0.09 (stat) + 0.05 (sys)] x 1073 for GBT-WiggleZ, QubmrmeLc = [0.40 + 0.09 (stat) + 0.04 (sys)] x 1073 for 
GBT-ELG, and OQpbmrm,trc = [0.35 + 0.08 (stat) + 0.03 (sys)] x 1073 for GBT-LRG, at z ~ 0.8 and an effective 
scale keg = 0.31h/Mpc. Results were also reported at keg = 0.24h/Mpc and keg = 0.48h/Mpc. The latter case 
corresponds to the same range of scales considered in Masui et al. (2013), who found 10° OQ bmn Hn, Wig = 0.34 + 0.07 
for the same field. These results imply that red galaxies are more weakly correlated with HI on the scales under 
consideration, suggesting that HI is more associated with blue star-forming galaxies and tends to avoid red galaxies. 
This is in qualitative agreement with what was found in Anderson et al. (2018), at a lower redshift z = 0.08 (it is 
also expected from galaxy evolution studies). Anderson et al. (2018) cross-correlated Parkes HI intensity maps with 
red and blue galaxies from the 2dF survey sample. Making some further assumptions, Wolz et al. (2022) also derived 
constraints on Qy1(z œ 0.8), which are shown in the left panel of Fig. 35. With little information on HI parameters 
beyond the local Universe, these are amongst the most precise Nyy constraints in an under-explored redshift range. 


Forecasts using the proposed SKA-MID and SKA-LOW surveys are shown in the right panel of Fig. 35. These 
use the anisotropic HI power spectrum (Eq. 114) to break the degeneracy between Qur and byr (Wyithe, 2008; Masui 
et al., 2010; Pourtsidou et al., 2017). Similar measurements can be achieved with instruments like CHIME and 
HIRAX, the MIGHTEE survey (Paul et al., 2021; Chen et al., 2021), and ASKAP (Wolz et al., 2017a). 


3.8.5.3 Cosmological forecasts 


The cosmological forecasts literature for HI intensity mapping surveys is exhaustive. The main result is that assum- 
ing excellent calibration and mitigation of systematic and foreground contamination effects, HI intensity mapping 
experiments can complement and compete with the largest and best Stage-IV optical galaxy surveys. Both single- 
dish and interferometric HI intensity mapping surveys can probe dark energy, gravity, and the initial conditions of 
the Universe at a level comparable to optical surveys like Euclid (Laureijs et al., 2011; Blanchard et al., 2020) and 
VRO/LSST (The LSST Dark Energy Science Collaboration et al., 2018). Here, we summarize the main findings and 
caveats. Unless otherwise stated, we quote lo forecast marginal errors for the various parameters, and give a few 
representative references for the interested reader. 


e Large sky HI intensity mapping surveys with radio telescopes like MeerKAT and SKA-MID (in single-dish 
mode), Tianlai, CHIME, HIRAX, PUMA, and SKA-LOW can use the HI power spectrum to probe galaxy 
evolution and cosmology at a very wide redshift range (0 < z < 6). Using the CPL parameterization for the dark 
energy EoS (Eq. 9), the forecasts give a(wo) ~ 0.05, o(wa) ~ 0.15, with the fiducial values (wo, wa) = (—1, 0) 
for the ACDM model. Parameterizing the growth of structure as f(z) = Qm(z)7 (Lahav et al., 1991b; Linder, 
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2003) the forecasts give o(y) ~ 0.03, with the fiducial value y = 0.55 for GR. For more details, see e.g. Chang 
et al. (2008); Masui et al. (2010); Battye et al. (2013); Hall et al. (2013); Bull et al. (2015); Cosmic Visions 21 
cm Collaboration et al. (2018); Heneka and Amendola (2018); Weltman et al. (2020); Liu et al. (2020). The 
neutrino mass can also be constrained, o(M,) ~ 0.3 eV (95% CL) (Villaescusa-Navarro et al., 2015). Most 
forecasts are very optimistic, assuming perfect instrument calibration and foreground removal. In addition, 
there are degeneracies between the HI parameters and the cosmological parameters, for example HI intensity 
mapping surveys without prior assumptions can constrain Ty; fog and not fog like optical galaxy surveys. 
Forecasts taking into account some of these caveats are presented in Padmanabhan et al. (2019); Bernal et al. 
(2019); Camera and Padmanabhan (2020); Soares et al. (2021). 


The aforementioned surveys can probe ultra-large scales and constrain the primordial non-Gaussianity param- 
eter, fynu, to a level o( fui) ~ 1 (Camera et al., 2013; Alonso et al., 2015a; Karagiannis et al., 2020). However, 
foreground removal effects can lead to large degeneracies and biased estimates (Cunnington et al., 2020a). 
These need to be controlled for HI intensity mapping to reach its full potential. 


Joint analyses of HI intensity maps, optical galaxy surveys (galaxy clustering and cosmic shear), and CMB 
experiments, can be a powerful way to mitigate systematic effects and constrain HI and cosmological parameters 
(Wyithe, 2008; Masui et al., 2010; Pourtsidou et al., 2016; Villaescusa-Navarro et al., 2015; Pourtsidou et al., 
2017; SKA Cosmology SWG, 2020; Viljoen et al., 2020). Using the multiple tracers method can suppress cosmic 
variance on large scales and provide the most precise measurements of primordial non-Gaussianity and general 
relativistic effects (see, e.g., Alonso and Ferreira, 2015; Fonseca et al., 2015, 2017; Witzemann et al., 2019). 


More futuristic prospects include HI intensity mapping lensing (Pourtsidou and Metcalf, 2014; Jalilvand et al., 
2019), exploiting higher order statistics such as the bispectrum (Karagiannis et al., 2021; Cunnington et al., 
2021b), and cross-correlations between gravitational wave detections and HI intensity maps (Scelfo et al., 2022). 
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3.9 Surface Brightness Fluctuations 


Tonry and Schneider (1988) introduced the method of surface brightness fluctuations (SBF hereafter) as a way to 
obtain distances to stellar systems based on the discrete nature of star counts. As refined in later papers (e.g., Tonry 
et al., 1990; Jensen et al., 1998; Blakeslee et al., 1999a; Cantiello et al., 2005; Mei et al., 2005a), the SBF method 
uses the stochastic nature of star counts and luminosities to measure a quantity that is closely linked to the mean 
brightness of the red giant branch (RGB) star population in a galaxy or other stellar system. 


3.9.1 Basic idea and equations 


Qualitatively, the idea behind the method is simple, as illustrated in Fig. 36 (panels a-c). Stars that can be indi- 
vidually identified in nearby stellar systems gradually blend into a smooth brightness distribution as the distance 
increases, but the discrete nature of the stars can be discerned through statistical fluctuations in the integrated flux 
of the stars per resolution element. These fluctuations are lower relative to the mean surface brightness (i.e., the 
galaxy appears smoother) at larger distances. 


Observationally, SBF is the ratio of the intrinsic variance (correcting for the blurring by the point spread function, 
PSF) of the stellar light distribution of a region of a galaxy to the mean surface brightness within the same region. 
In the nearby Universe, galaxy surface brightness is independent of distance, but the variance per unit solid angle 
decreases as distance squared. The ratio of the variance to the mean has units of flux and constitutes the SBF 
observable. Although it may be harder to visualize than other standard candles, such as supernovae or Cepheids, it 
is as rigorously defined, and scales in the same way with distance. Physically, the SBF is related to the ratio of the 
first and second moments of the stellar luminosity function within the region analyzed. 


For example, consider a galaxy that projects a stellar population of n; stars of flux f; (where i = 1,..., N covers the 
entire flux interval, i.e. all evolutionary phases, of the stellar population) on a particular pixel k in an image. Along 
an isophote (the locus of pixels of equal surface brightness within the galaxy) there are many, say M, independent 
realizations of the population |n;, fi]. Each pixel can be considered a realization. Ignoring, for the moment, the PSF 
blurring, these realizations obey Poisson statistics, and the first two moments of the stellar intensity distribution can 
be written as: 


e (NxM)! x S N (mes x fri) is the average surface brightness per realization (or pixel); 


© (Nx M)! x oN (nes X fka) is the mean-squared flux of the realizations. 
The index k runs over the pixel realizations, and i runs over the stellar luminosity function bins. If we assume that 
the same form of the underlying luminosity function applies to all the pixels in the region being analyzed, then the 
mean flux per stellar bin is independent of the pixel: fk; = f;. The mean SBF flux, which is defined by Tonry and 
Schneider (1988) as the ratio of the second to the first moment of the flux along the isophote, then becomes: 


Linea fh O Lim fi 
se eee ae (127) 


Thus, f is the flux-weighted mean stellar flux in the region of the isophote being analyzed. The corresponding 
luminosity is L, which is equal to the ratio of the first two moments of the stellar luminosity function and can be 
readily calculated from stellar population models. Because of the squared weighting in the numerator, the SBF signal 
is dominated by the brightest stars in the population. 


f= 


In practice, the SBF measurement is done over finite regions of the galaxy. It is not necessary for the surface 
brightness to be constant, but the stellar luminosity function should not vary significantly over the region. One 
deals with the varying surface brightness by subtracting a smooth model for the light distribution and measuring the 
fluctuations in the residual image. In this case, the numerator of f becomes the variance with respect to the mean 
surface brightness. For a fully rigorous statistical treatment of SBF statistics, see Cerviño et al. (2008). 


The SBF apparent magnitude is defined as m = —2.5 log( f) + mzp, where mzp is the magnitude zero-point of 
the system. Although m can be measured for any galaxy, this does not mean that a useful distance can be derived. 
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Figure 36: Illustration of SBF observations and measurements. (a) Simulation of the stellar population in a 
spheroidal galaxy at the distance of the Virgo cluster (Dvirgo ~ 16.5 Mpc, Blakeslee et al., 2009) as observed 
with the E-ELT in ~1 hour (Cantiello et al., 2021, in prep.). (b) Same as in panel (a), but for a galaxy ten times 
more distant. (c) Same as in panel (a), but for a galaxy fifty times more distant. Stars, which appear marginally 
resolved in panel (a), blend together into a smooth brightness profile at larger distances. (d) Near-infrared image 
of NGC 1399 from the HST WFC3 camera. (e) Model of NGC 1399’s surface brightness distribution derived from 
the WFC3/IR image. (f) Residual frame, obtained from the galaxy image (d) minus the model (e). (g) Typical 
luminosity function analysis for estimating the “residual variance” P, due to contaminating sources: green squares 
show the data, the blue curve and red line show the fits to the globular cluster and background galaxy luminosity 
functions, respectively, and the solid black line is the combined model luminosity function (data and fits are 
from Cantiello et al., 2011). The vertical gray dashed line indicates the GCLF turnover magnitude and the 
shaded area shows the magnitude interval where the detection is incomplete. (h) Color-magnitude diagram of 
an old stellar population (data for the MW globular cluster NGC 1851 from Piotto et al., 2002); the RGB/AGB 
population is highlighted with red dots. (f) A schematic illustration of the SBF power spectrum analysis. Images 
reproduced with permission from Blakeslee et al. (2009), Cantiello et al. (2011), and Piotto et al. (2002), copyright 
by Astrophysical Journal and Astronomy & Astrophysics 
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One also needs a reliable calibration of M, the absolute magnitude that gives the correct distance modulus (m—M) 
for a galaxy with the measured m. M depends only on the photometric bandpass and the stellar population in the 
galaxy. Thus, unlike other galaxy-based distance indicators, SBF does not depend on the mass, effective radius, 
dynamics, or environment of the galaxy, although these properties may influence the stellar population. 


The measured SBF magnitude m and a proper calibration of M are presently used to determine accurate distances 
moduli to galaxies within D ~ 150 Mpc, enabling robust constraints on the Hubble parameter in the Local Volume 
via the Hubble-Lemaitre law: 

Ho = (v/D) , (128) 


where v is the flow-corrected recessional velocity of the target galaxies. 


Of course, not all stellar populations are created equal. In particular, galaxies that have undergone recent star 
formation have poorly calibrated values of M, which causes systematic uncertainty in their SBF distance. For this 
and other reasons, elliptical galaxies are the preferred targets for SBF studies. Until recently, challenges with data 
depth and quality have prevented precise distance measurements for significant samples of galaxies reaching into the 
Hubble flow, but datasets and observing strategies have improved (Cantiello et al., 2018a; Blakeslee et al., 2021; 
Jensen et al., 2021), making SBF a powerful cosmological probe with a bright future. 


3.9.2 Sample Selection 


Measuring SBF magnitudes requires careful modeling and subtraction of the galaxy light distribution. As described 
later (Sect. 3.9.3.1), any small-scale residual features with structure on the scale of the PSF will complicate the 
analysis. Such features may be associated with dust, bars, shells, or other irregularities. In severe cases, the SBF 
signal may be entirely overwhelmed. As a result, the smoothest, most featureless galaxies are the prime targets 
for the SBF method; that is to say, ™ is easiest to measure in giant ellipticals and other early-type galaxies with 
substantial bulge components. 


Of course, the stellar fluctuations must be sufficiently bright in order to detect the SBF signal; for distances in 
the Hubble flow, this requires the reddest optical bands or observing in the near-IR. The contamination from dust is 
also reduced at these wavelengths. However, beyond ~ 2 um, the uncertainties in the calibration become too large 
for precise distances. Below we discuss these issues of sample and bandpass selection in more detail. 


Choosing the galaxies. In addition to simplifying the galaxy subtraction for the m measurement, early-type 
galaxies tend to be dominated by old stellar populations (Fig. 36, panels e and h), which simplifies the M calibration. 
This can be seen from empirical plots of the M versus color relations (e.g., Blakeslee et al., 2009, 2010; Jensen et al., 
2015; Cantiello et al., 2018a; Carlsten et al., 2019) for galaxies at a common distance. The empirical relations show 
that for red early-type galaxies, the correlations between the absolute SBF magnitude in red or near-IR bands and 
broad-baseline optical color have lower scatter than for fainter, bluer galaxies. In selected passbands (see below), the 
small intrinsic scatter in the M-color calibration relations for red galaxies in principle allows distance precision as 
low as ~ 3%. In practice, the errors are larger because of measurement uncertainties, as discussed in Sect. 3.9.3.1. 


Consistent with observations, stellar population models predict less scatter in M at a given color for metallicities 
similar to those found in massive galaxies (e.g., Blakeslee et al., 2001b; Mei et al., 2005b). At the blue end, galaxies 
may have low metallicities, younger ages, or a combination of both. At these blue colors, the SBF is more affected by 
age than metallicity; thus two galaxies with similar optical colors may have significantly different SBF magnitudes if 
they have had different star formation histories, as discussed by Greco et al. (2021). As a result, the observed scatter 
in the SBF calibration can be quite large at the blue end, and it becomes difficult to measure individual distances 
with a precision better than ~ 10% due to the calibration effects alone. 


Thus, for both measurement and calibration reasons, the ideal target galaxies for the SBF method are red early- 
type galaxies with no recent star formation and little or no dust. In spite of this, SBF measurements cover practically 
the entire mass range of galaxies (e.g., Blakeslee et al., 2009; Jensen et al., 2021; van Dokkum et al., 2018; Carlsten 
et al., 2019) and a wide range of morphologies, including the bulges of spirals (Tonry et al., 2000) and ultra-diffuse 
galaxies (Blakeslee and Cantiello, 2018). As long as there is a clean area of the galaxy without recent star formation, 
it is possible to derive an SBF distance. 


Another consideration in defining an observational sample is that the SBF must be detected to high signal-to- 
noise (S/N), including the effect of correcting for contaminating sources. This puts a practical limit on the distance 
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to which the SBF measurements can be made. Of course, for cosmologically interesting measurements, the galaxies 
must be distant enough to be in the Hubble flow (i.e., d Z 50 Mpc). The depth requirement and the related distance 
limit depend sensitively on the bandpass, which we discuss next. 


Choosing the bandpass. The most common photometric bands used for SBF distance measurements in recent 
years have been i,/,z,J,H, and K, spanning the wavelength range from ~ 0.8 to ~ 2.2 um (Tonry et al., 2001; 
Jensen et al., 2003, 2021; Mei et al., 2007; Blakeslee et al., 2009, 2010; Cantiello et al., 2007, 2018a; Biscardi et al., 
2008). At shorter wavelengths, the SBF signal is much fainter, and the slope of the M relation with color tends to 
be steeper because of increased sensitivity to stellar population effects (e.g. Worthey, 1993; Blakeslee et al., 2001b; 
Cantiello et al., 2003). Of course, dust is also a bigger problem in bluer bands. 


The intrinsic scatter of M as a function of color is as low as ~ 0.05 mag for red galaxies in passbands near 1 ym 
(Blakeslee et al., 2009; Blakeslee et al., 2021). The intrinsic dispersion is less well constrained at longer wavelengths, 
but appears to be closer to 0.1 mag in the H and K bands (Jensen et al., 2003, 2015), likely because of the increased 
stochastic effect of small numbers of luminous red asymptotic giant branch stars (AGB), the properties of which 
depend sensitively on population age (Raimondo et al., 2005; Raimondo, 2009). 


Another issue that is worse in bluer bands (like B or V) is the contamination of the SBF signal by globular 
clusters host in the galaxy, point-like sources that produce extra variance in the image. In bands where the SBF is 
fainter, the globular clusters must be identified and removed to fainter magnitudes. Even in the I band, for elliptical 
galaxies with typical globular cluster frequencies, sources must be detected and masked to < 0.3 mag of the peak, or 
“turnover,” of the globular cluster luminosity function (GCLF), in order to decrease the contamination to the ~ 20% 
level (Blakeslee and Tonry, 1995), which reduces the uncertainty in the correction to ~ 5% (Tonry et al., 1990). 


In contrast, with the much brighter fluctuations in the K band, it is only necessary to reach within ~ 2 mag of 
the GCLF turnover to reduce the contamination to the same level (Jensen et al., 1998). Thus, the stellar population 
scatter in M is a much bigger issue than globular cluster contamination for SBF measurements near 2 um. 


Currently, the most efficient instrumental system available for SBF distances is the F110W (broad J) filter of the 
WFC3/IR on the Hubble Space Telescope (HST). Using this setup, it is possible to measure distances for early-type 
galaxies out to 80 Mpc with a median statistical uncertainty of 4% (Blakeslee et al., 2021; Jensen et al., 2021) in only 
one HST orbit. One of the reasons this system works well is that the near-IR sky background is much fainter from 
space. For future ground-based surveys, such as the one provided by the Vera Rubin Observatory, we expect that 
the y band, which covers the red end of the optical spectrum similar to F110W but extends less far into the near-IR, 
may prove the best choice for SBF measurements. At present, however, we lack data in this band for testing. The 
following section describes SBF measurements for the preferred targets of early-type galaxies in bands suitable for 
accurate distance determination. 


3.9.3 Measurements 


SBF distance determination consists of two parts: measuring the fully corrected apparent SBF magnitude m and 
determining the best value for the absolute M from a calibration based on distance-independent stellar population 
properties, typically broadband color. 


3.9.3.1 Measuring SBF magnitudes 


In the absence of any atmospheric and instrumental blurring and external sources of fluctuation, the SBF signal of 
a stellar system would simply be the statistical variance due to the varying numbers and luminosities of the stars in 
each pixel, normalized by the local mean flux. In reality, PSF blurring creates a correlation between adjacent pixels; 
therefore, the SBF signal is measured in Fourier space by determining the amplitude of the component on the scale 
of the PSF in the image power spectrum. If the large-scale light distribution of the galaxy is well-subtracted, then 
the power spectrum will consist mainly of a white noise component and a component convolved with the PSF. There 
may be additional power at lower wavenumbers (larger scales) due to imperfect galaxy subtraction, or at higher 
wavenumbers (smaller scales) due to correction of geometric distortion of the image (Mei et al., 2005a; Cantiello 
et al., 2005), but these wavenumbers can be omitted from the analysis. 
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The detailed process of measuring SBF magnitudes is described in numerous papers with some variations based 
on the bandpass and other properties of the data (Blakeslee et al., 1999a, 2009; Cantiello et al., 2005, 2007; Jensen 


et al. 


, 1998, 2021; Mei et al., 2005a,b). See these papers for details on putting the method into practice. Here, we 


highlight the main steps of the procedure in order of execution, and the products of each step. 


i) 


ii) 


iii) 


2 


Ss 


vi 


Sa 


vii) 


Galaxy model (Fig. 36, panel (e)): a smooth isophotal model of the galaxy surface brightness after sky sub- 
traction; the resulting model frame corresponds to the first moment of the light distribution. 


Residual frame (Fig. 36, panel (f)): difference image obtained by subtracting the galaxy surface brightness 
model (and a low-order fit to the background) from the original sky-subtracted image. 


Mask frame: mask made by identifying all sources of non-SBF variance (dust, globular clusters, foreground 
stars, background galaxies, bright satellite galaxies, tidal features, bars, etc.) down to a specified S/N threshold 
and masking them out. 


Fluctuation frame: the masked residual frame normalized by the square root of the model frame, used to 
measure the normalized stellar fluctuations; also contains contaminating fluctuations from unexcised sources 
fainter than the detection limit, plus white noise resulting from photon counting statistics and detector read 
noise. 


Power spectrum frame: 2-D Fourier power spectrum of the fluctuation frame, used to derive the SBF amplitude 
after azimuthal averaging (Fig. 36, panel (i)). Because the stellar fluctuations are convolved with the PSF of 
the image, in the Fourier domain they are multiplied to the Fourier transform of the PSF (convolved with the 
window function of the mask, see below). 

Once an accurate PSF template is created from stars in the field and normalized, the fluctuation amplitude 
is obtained as the constant Pp in Eq. 129 below. This is obtained by fitting the azimuthally averaged power 
spectrum of the fluctuation frame, P(k), with the expectation power spectrum E(k). Here E(k) is a convolution 
of the PSF power spectrum and the mask function with which fluctuation frame was multiplied. In addition, 
the power spectrum includes a constant white-noise component; thus, the full power spectrum is modeled as: 


P(k) = P) x E(k) +P, . (129) 


Correction for background fluctuations (Fig. 36, panel (g)): globular clusters and background galaxies that are 
too faint for direct detection will remain in the image after masking, and their flux will contribute to the Po 
component of the power spectrum. To correct for this contamination, we calculate the “residual power” P, from 
contaminating sources by extrapolating a fit to the combined GCLF and background galaxy luminosity function. 
The ability to detect and remove the globular clusters is often the limiting factor in how far SBF distances can 
be measured. Contamination due to background galaxies is normally much less for giant ellipticals. 


SBF magnitude: Using the measured fluctuation amplitude Pp and the estimated contribution from contam- 
inating sources P,, the stellar fluctuation signal is Pp = Pp — P,. This corresponds to f in Eq. 127. Thus, 
converting to the SBF magnitude: Mm = —2.5log(Py) + mzp, where mzp is the appropriate photometric zero- 
point magnitude. 


3.9.3.2 SBF Calibration 


To obtain a distance from the measured m, one must adopt an absolute SBF magnitude M for the stellar population. 
This can be done using either an empirical calibration or theoretical predictions from stellar population synthesis 
models. With some exceptions (e.g., Biscardi et al., 2008), the vast majority of published SBF distances rely on 
empirical calibrations (Tonry et al., 2001; Blakeslee et al., 2001a, 2009; Cantiello et al., 2018a; Jensen et al., 2003, 
2021). 


The ground-based SBF survey by Tonry and collaborators (Tonry, 1997; Tonry et al., 2001; Blakeslee et al., 1999b) 
measured [-band SBF magnitudes my and V—TI colors for 300 galaxies out to about 40 Mpc and derived the first 
high-quality empirical SBF calibration. To do this, Tonry (1997) plotted My as a function of V—J for nearby groups 
and clusters, determining a single linear slope for the color dependence of mz. The zero-point of the calibration was 
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then determined from SBF measurements in the bulges of six spiral galaxies that also had distances measured from 
Cepheids (Tonry et al., 2000). This was revised slightly by Blakeslee et al. (2002) using a recalibrated set of Cepheid 
distances from Freedman et al. (2001). The resulting linear calibration fully specified My as a function of V—I. The 
intrinsic scatter about this relation was estimated to be of order 0.05 mag, although it was fairly uncertain because 
the median statistical error on the distances was roughly four times larger. 


The same basic approach, with some variations, has been used to derive empirical ™-color calibrations for the 
SBF method in V (Blakeslee et al., 2001b), K (Jensen et al., 2003), ACS/F850LP (Mei et al., 2007; Blakeslee et al., 
2009), WFC3/F110W (Jensen et al., 2015), and i (Cantiello et al., 2018a). Higher-order polynomials were used 
for the ACS and WFC3 calibrations, while Cantiello et al. (2018a) presented calibrations that combined two color 
indices, rather than just one as in previous cases. 


In general, the empirical approach works well and the resulting calibrations agree with theoretical predictions 
within the uncertainties. The weak point remains the distance zero-point, which is tied to the Cepheid distance scale 
via measurements in spiral galaxies, which are not ideal targets for the SBF method. However, alternative zero-point 
calibrations based on the tip of the red giant branch (TRGB) have also been presented (Mould and Sakai, 2009; 
Blakeslee et al., 2021), and these agree well with the Cepheid-based calibration. Fully theoretical calibrations of M 
versus color do not rely on other distance indicators, and thus do not carry systematic uncertainties from Cepheids 
or other primary distance indicators. However, the often poor agreement among different sets of models shows that 
theoretical calibrations still carry substantial systematic uncertainties, especially in the near-IR bands (e.g., Jensen 
et al., 2015), which are observationally most promising for future SBF studies. 


As a final remark, we note that since the empirical M calibrations are parameterized by photometric color, 
precise measurements of the galaxy colors are required for high-quality distance estimates. Thus, great care must be 
dedicated to observational details such as photometric calibration, flat-fielding, sky subtraction, etc. 


3.9.3.3. Statistical uncertainties 


Before moving on to systematic effects, we summarize the statistical uncertainties in SBF distance measurements. 
These can be grouped into three categories: i) random errors in the photometric calibration, ii) errors in the 
measurement of the fluctuations themselves, including the corrections for background contamination, and ii) random 
uncertainty in M resulting from stellar population effects and errors in the galaxy color estimate. 


The first category includes effects such as flat-fielding, background estimation, uncertainty in the galactic extinc- 
tion, and uncertainties in the photometric zero-point, that are not specific to the SBF method. These effects are 
typically at the 1% level, but care must be taken to account for them in a consistent way, as they may contribute 
to various parts of the SBF measurement process. For example, extinction uncertainty affects both m and the color 
estimate used for determining M. 


Factors contributing to statistical uncertainties in the SBF measurement include: the accuracy of the galaxy 
surface brightness model, the fit to the image power spectrum to determine Po, extrapolation of the luminosity 
function fit for the contaminating sources (the error is typically 20-25% of the correction itself), and the match of 
the PSF template to the data being analyzed. These errors can be minimized by optimizing the observing (including 
instrument, exposure time, and bandpass) and data analysis strategies. As shown in several works (e.g., Blakeslee 
et al., 2009, 2010; Cantiello et al., 2018a; Jensen et al., 2021), the total statistical uncertainty on ™ can be kept as 
low as 0.04 to 0.05 mag. 


Concerning random errors in M, if the images have high S/N and are in the same bandpasses used for the SBF 
calibration so that no photometric transformation is needed, then the error in M due to the color uncertainty can be 
kept to the ~ 0.01 mag level. In this case, the random error in M is dominated by intrinsic scatter in the calibration 
due to stellar population effects. In the I,z, and J bands, this scatter is estimated to be 0.05 to 0.06 mag (e.g., 
Tonry, 1997; Blakeslee et al., 2009; Cantiello et al., 2018b). 


In summary, measurement uncertainties in well-designed SBF observations can be reduced to the ~0.05 mag 
level. If these observations are of red galaxies in a well-suited bandpass near 1 um, the intrinsic scatter in the 
calibration relation will be at a similar level. Combining these two sources of error gives a total statistical error as 
low as ~ 0.07 mag, or about 3.3% in distance per galaxy, although ~ 4% is more typical for the median statistical 
error in well-designed SBF distance samples (e.g., Blakeslee et al., 2021; Jensen et al., 2021). 
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3.9.4 Systematic effects 


The dominant systematic uncertainty affecting all SBF distances is the zero-point of the absolute M calibration. 
This zero-point was determined by comparing ground-based /-band SBF magnitudes for the bulges of spiral galaxies 
with measured Cepheid distances (Tonry, 1997; Tonry et al., 2000; Ajhar et al., 2001; Blakeslee et al., 2002). In most 
cases, the SBF zero-points in other bands have been set by tying the measurements to the J-band SBF distances 
(e.g., Mei et al., 2007; Blakeslee et al., 2009; Jensen et al., 2015; Cantiello et al., 2018a). 


The most recent analysis of the systematic uncertainty in SBF distances was by Blakeslee et al. (2021), who revised 
the zero point to account for the improved LMC distance determined to ~ 1% precision by Pietrzyriski et al. (2019). 
They concluded that the zero-point uncertainty in the Cepheid-calibrated SBF distances in the WFC3/F110W band 
(the most useful for constraining Ho) is 0.09 mag, or 4.2% in distance. This is larger than the typical HST SBF 
measurement error. 


Since the SBF method works best for early-type galaxies with old stellar populations, and these do not contain 
the young Cepheid stars (see Sect. 3.9.2 above), it is worth exploring other means for calibrating SBF. The TRGB 
method is ideal for measuring distances of early-type galaxies and obtaining an independent calibration of SBF. Like 
Cepheids, it is possible to calibrate the TRGB method with geometric distances from Gaia (e.g., Soltis et al., 2021), 
but unlike with Cepheids, the stellar population underlying both SBF and the TRGB is the same, i.e. , old low-mass 
stars. 


A first attempt to calibrate SBF with TRGB (Mould and Sakai, 2009) used a sample of 16 galaxies within 10 Mpc 
dominated by relatively blue dwarf galaxies and found negligible change with respect to the Cepheid calibration, but 
the distance uncertainties were large and the colors did not extend to the range occupied by massive red ellipticals, 
the preferred SBF targets at large distances. More recently, Blakeslee et al. (2021) rederived the SBF zero-point using 
the few TRGB distances available for massive early-type galaxies. They concluded that the mean offset between the 
Cepheid and TRGB calibrations of SBF was 0.01 + 0.10 mag. Because these two calibrations were independent and 
consistent, they could be combined to improve the precision on the SBF zero-point; this reduces the systematic error 
in the SBF distances to just over 3%. 


Another potential systematic effect comes from SBF k-corrections for galaxies at larger distances. These must be 
estimated from stellar population models. Based on the model calculations by Liu et al. (2000), Jensen et al. (2021) 
estimated the SBF k-corrections in F110W to be less than 0.01 mag at 100 Mpc, the limit of their sample. Thus, 
k-corrections are not currently a significant problem, but they could become more important for future studies. We 
come back to this issue in Sect. 3.9.5.2. 


In conclusion, the systematic uncertainty on SBF distances is slightly larger than 4% when based solely on 
Cepheids. Combining the best current Cepheid and TRGB calibrations for SBF, the systematic error in distance 
drops to about 3%. Ultimately, the TRGB method should provide much better precision because it can be used 
in the same type of galaxies, giant ellipticals, which are best for SBF measurements, while Cepheids only occur 
in galaxies that are inherently problematic for the SBF method. Blakeslee et al. (2021) estimated that with a 
sample of ~ 15 giant ellipticals having both high-quality SBF and TRGB distances, it would be possible to reduce 
the systematic uncertainty in the SBF zero-point to the 2% level, including the uncertainty in the TRGB absolute 
magnitude calibration, which should soon approach the 1% level, thanks to Gaia. Such an overlapping sample of 
SBF and TRGB distances to giant ellipticals becomes feasible with the advent of JWST. 


3.9.5 Main results and forecasts 
3.9.5.1 Main results 


The SBF method has been in use for several decades. About 600 independent SBF distances (for ~ 400 distinct 
galaxies) have been measured from the Local group to ~ 130 Mpc. Samples with at least 20 galaxies include: Tonry 
et al. (2001); Jensen et al. (2003, 2021); Mieske et al. (2005, 2006); Blakeslee et al. (2009); Cantiello et al. (2018a); 
Cohen et al. (2018); Carlsten et al. (2019). Soon, there will be another > 200 from the Next Generation Virgo Survey 
(Cantiello et al., 2022, in prep.). Although the method is capable of high precision, the quality of published SBF 
distances is quite heterogeneous, with errors typically 4-5% for HST measurements, 10% for ground-based data on 
giant ellipticals, and up to 30% for some dwarfs galaxies. 


97 


SBF distances have been used to map the velocity field of the local Universe (Tonry et al., 2000), constrain the 
cosmic mass density (Blakeslee et al., 1999b), probe the structure of nearby clusters (Mei et al., 2007; Blakeslee et al., 
2009; Cantiello et al., 2018a), estimate supermassive black hole masses (Event Horizon Telescope Collaboration et al., 
2019; Nguyen et al., 2020; Liepold et al., 2020), investigate satellite galaxy systems (Cohen et al., 2018; Carlsten 
et al., 2019), confirm the lack of dark matter in some ultra-diffuse galaxies (van Dokkum et al., 2018; Blakeslee and 
Cantiello, 2018), and measure the most precise distance to the host galaxy of the binary neutron star merger event 
GW170817 (Cantiello et al., 2018b). SBF has also been used for various determinations of the Hubble constant, 
Ho(Tonry et al., 2000; Blakeslee et al., 1999b, 2002; Jensen et al., 2001; Biscardi et al., 2008). Here we focus on two 
recent Ho studies. 


Khetan et al. (2021), presented a recalibration of the peak magnitudes of 24 local SNe Ia using a heterogeneous 
sample of ground and space-based SBF distances from the literature. Adopting a hierarchical Bayesian approach, 
the authors then extended the calibration to a sample of 96 SNela at redshifts 0.02 < z < 0.08 and derived 
Ho = 70.5 + 2.4 (stat) + 3.4 (sys) km s~'Mpc'. Note that in this case, SBF is used as an intermediate rung in the 
distance ladder, between Cepheids and SNe Ia, rather than constraining Hp directly. When updated for consistency 
with the improved LMC distance from Pietrzyriski et al. (2019), the result becomes Ho = 71.2 + 2.4 + 3.4 km 
s~!Mpc7l. 


Blakeslee et al. (2021), using the homogeneous sample of 63 SBF distances measured by Jensen et al. (2021) 
for bright, mainly early-type, galaxies out to 100 Mpc observed with the F110W filter of HST’s WFC3/IR, derived 
Ho = 73.3+0.7+2.4 km s~! Mpc~t. The systematic (second) error mainly represents the SBF zero-point uncertainty 
after combining the Cepheid and TRGB calibrations. Because peculiar velocities can have an important impact over 
this distance range, Blakeslee et al. (2021) tested four different treatments of the galaxy velocities, including two 
large-scale flow models, and included this effect in the systematic error estimate. Figure 37 shows example Hubble 
diagrams from the study. 


The Ho result by Blakeslee et al. (2021) agrees well with most other local measurements and with Khetan et al. 
(2021) to within lo. It disagrees by more than 20 with the value of Ho based on the cosmic microwave background, 
assuming the standard ACDM model (Planck Collaboration et al., 2020b), reinforcing the tension. More WFC3/IR 
SBF distances are being obtained by ongoing HST programs; these will improve the constraints on the velocity model 
and further reduce the uncertainties on Ho. 


3.9.5.2 Forecasts 


The outlook for SBF is bright for several reasons: the next generation of wide-field survey telescopes will produce 
imaging data suitable for SBF measurements; JWST and the AO-assisted ELT facilities will allow the method to 
be pushed to unprecedented distances; and new samples of TRGB distances, tied to Gaia parallaxes, will drastically 
reduce the systematic uncertainty in the SBF zero-point calibration. Sec 3.9.4 already discussed the expected zero- 
point improvement from the TRGB calibration. Here we comment on the other two anticipated opportunities for 
SBF studies. 


Wide-field surveys. Forthcoming large sky surveys, such as the Vera Rubin Observatory (LSST Science Collabo- 
ration et al., 2009) and the Euclid Wide Survey (Laureijs et al., 2011), will produce breakthroughs in many fields of 
astronomy, including excellent opportunities to use SBF to map the spatial distribution of galaxies in the low-redshift 
Universe. The detailed simulations by Greco et al. (2021) indicate that Rubin will produce an unprecedented dataset 
for SBF studies. The multi-band ugrizy Rubin dataset, with typical seeing of 0.7”, and final 5c point source depth 
of is, ~ 26.8 mag, will make it possible to measure SBF distances with 10% accuracy out to at least 70 Mpc, twice 
as far as the limit of the ground-based SBF survey of Tonry et al. (2001). 


The Euclid satellite!” has one-fourth the collecting area of HST but, compared to Rubin, it has the advantage of 
near-IR, coverage and a sharp (~0.2”), stable PSF. Taking as reference the Euclid/NISP H band, with a predicted 50 
point source depth of H;, ~ 24 mag, Euclid should enable SBF distances for all suitable galaxies out to at least half 
the distance as Rubin (~ 30 — 40 Mpc), and possibly more. Another future wide-field mission of enormous interest 
for SBF is the Nancy Grace Roman Space Telescope!’ (Spergel et al., 2015). It will have the same aperture as HST 


IThttps: //sci-esa.int /web/euclid 
18https: //roman.gsfc.nasa.gov / 
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Figure 37: Hubble diagrams and residuals from Blakeslee et al. (2021) based on Cepheid-calibrated WFC3/IR 
SBF distances tabulated by Jensen et al. (2021). The velocities are group-averaged values in the cosmic microwave 
background rest frame without correction for peculiar motions (left) and corrected using the 2M++ flow model 
derived from the redshift-space density field analysis of Carrick et al. (2015). Solid symbols indicate “clean” 
galaxies, for which no dust or spiral structure is evident; open symbols are for galaxies with obvious dust and/or 
spiral structure. The best-fit Hubble constants are indicated, and the statistical and systematic error ranges are 
shown in dark and light gray, respectively. The reduced x° improves from 0.97 for the fit on the left to 0.89 for the 
fit using the flow-model, but the value of Ho depends on the overall velocity scale factor, and the study adopted 
the model-independent version for this. Ho would increase by 0.3% for the TRGB-based SBF calibration. Image 
reproduced with permission from Blakeslee et al. (2021), copyright by AstroPhysical Journal. 


and similar resolution, but ~ 100 times the field of view and better IR sensitivity. With a 50 point-source depth 
of 28 mag in 1 hr in the J and H bands, Roman will deliver phenomenal survey depth and breadth, making it the 
ultimate machine for producing SBF distances. More detailed simulations are needed to quantitatively refine the 
expectations for both Euclid and Roman. 


Going deeper. JWST (Gardner et al., 2006) and the forthcoming 30-40m class of Extremely Large Telescopes will 
have near-IR imaging capabilities far exceeding that of HST. As discussed above, JWST should greatly improve the 
SBF zero-point calibration by enabling much more extensive direct comparisons of SBF and Gaia-calibrated TRGB 
distances in giant ellipticals. This will significantly reduce the systematic uncertainty in Hp, making SBF competitive 
with SNe Ia in this area. 


In addition, these new facilities will make it feasible to go far beyond the previous limit of 100-150 Mpc achieved 
with HST (Jensen et al., 2001, 2021; Biscardi et al., 2008). With its sharper (FWHM~0.1”), better sampled PSF 
in the near-IR and ~ 7 times the collecting area of HST, JWST should enable SBF distance measurements out to 
~ 300 Mpc. As always, the limiting factors will be contamination from globular clusters, and a newly significant 
consideration will be the quality of the k-corrections derived from stellar population models (Sect. 3.9.4). Further 
work is needed on this issue. 


The ELTs hold even greater promise for pushing SBF to unprecedented depths, potentially out to z ~0.1, and 
perhaps even directly probing cosmic acceleration and dark energy as a complement to SNe Ia. However, this depends 
critically on the ability to measure precise and reliable SBF magnitudes using adaptive optics (AO). Although some 
studies have been made of this topic (Gouliermis et al., 2005; Jensen, 2012), quantitative demonstrations of AO- 
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assisted SBF measurements are lacking. Further work, using actual AO data, is much needed, and again k-corrections 
will be an important ingredient in deriving accurate calibrations. We appeal to the stellar population modelers of 
the world to dedicate some effort to this important problem. 
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3.10 Stellar Ages 


The expansion rate of the Universe determines the look-back time. This opens up the possibility to use time (or age) 
measurements to constrain the background parameters of the cosmological model. The cosmic chronometers method 
(see Sect. 3.1) uses relative ages to determine H(z), but absolute ages can also be used in a complementary way. In 
fact, historically, absolute ages were used already in the 50’s and more extensively in the 90’s (see below) to impose 
competitive (then) constraint on the cosmological model. 


3.10.1 Basic idea and equations 


The look-back time t as function of redshift is given by: 


977.8 [7 dz’ 
te) = f TOE. (130) 


with E(z) = H(z)/Ho and H(z) in km s~'Mpc™'. Following Eq. (130), the age of the Universe is ty = t(o0). We 
show the dependence of ty on Ho, Qm and a constant EoS parameter w for dark energy in a wCDM model in Fig. 38. 
It is evident that the strongest dependence is on Ho, while Qm and w have less influence. 


The integral in Eq. (130) is dominated by contributions from redshifts below few tens, decreasing as z grows. 
Therefore, any exotic pre-recombination physics does not significantly affect the age of the Universe. On the other 
hand, E(z) is bound to be very close to that of a CMB-calibrated ACDM model at z < 2.4, as shown in Bernal et al. 
(2021). Hence, a precise and robust determination of ty which does not significantly rely on a cosmological model, 
in combination with BAO and SNe Ia, may weigh in on proposed solutions to the Ho tension. If an independent 
(and model-agnostic) determination of ty were to coincide with Planck’s inferred value assuming ACDM, alternative 
models involving exotic physics relevant only in the early Universe would need to invoke additional modifications 
also of the late-Universe expansion history to reproduce all observations as their prediction for ty would be too low. 


The color-magnitude diagram (CMD) of co-eval stellar populations in the Milky Way, or any other nearby galaxies 
where this is observationally possible, can be used to infer the age of its oldest stars. The age can also be estimated 
for individual stars if their metallicity and the distance are known. For resolved stellar populations, however, an 
independent measurement of the distance is not strictly necessary as the full morphology of the color-magnitude 
diagram can, in principle, provide a determination of the absolute age. There is extensive literature on the dating 
of stellar populations; reviews can be found in, e.g., Catelan (2018); Soderblom (2010); Vandenberg et al. (1996). 
In this section, we will focus on the most recent developments in the field. Ages of stars can also be computed via 
nucleo-cosmochronology (see, e.g., Christlieb, 2016), which consists in measuring global abundances of radioactive 
elements like Uranium and Thorium to estimate the age of the parent star. Another method is to use the cooling 
luminosity function of white dwarfs (see, e.g., Catelan, 2018, and references therein for the current status of these 
methods); while useful, they are not still at the accuracy level of stellar ages measured via the observed color- 
magnitude diagram, and we will not discuss them further. We will instead focus on the use of the color-magnitude 
diagram on Globular Clusters (GCs) as new developments are providing stellar ages at the few % level accuracy. 


The first quantitative attempt to compute the age of the globular cluster M3 was made by Haselgrove and Hoyle 
more than 60 years ago (Haselgrove and Hoyle, 1956). In this work, stellar models were computed on the early 
Cambridge mainframe computer and its results compared “by eye” to the observed color-magnitude diagram. A few 
stellar phases were computed by solving the equations of stellar structure; this output was compared to observations. 
Their estimated age for M3 is only 50% off from its current value.'® This was the first true attempt to use computer 
models to fit resolved stellar populations and thus obtain cosmological parameters: the age of the Universe in this 
case. Previous estimates of the ages of GCs involved just analytic calculations, which significantly impacted the 
accuracy of the results, given the complexity of the stellar structure equations (see e.g., Sandage and Schwarzschild 
(1952)). 


Historically, the age of the oldest stellar populations in the Milky Way has been measured using the luminosity 
of the Main-Sequence Turn-Off Point(MSTOP) in the color-magnitude diagram of GCs. In this way, however, the 


19Their low age estimate is due to the use of an incorrect distance to M3, since the stellar model used deviated just ~10% from current 
models’ prediction of the effective temperature and gravity of stars, with their same, correct assumptions Vandenberg et al. (1996). 
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is fixed to Planck18 ACDM best-fit value (Planck Collaboration et al., 2020a). White lines mark contours with 
constant value of tu. Image reproduced with permission from Bernal et al. (2021), copyright by APS. 


full richness of information contained in the whole color magnitude diagram is discarded, and only one point kept. 
While it is true that the MSTOP contains significant information about the age of the stellar population, other parts 
of the CMD diagram do as well, especially around the sub-giant branch and the main sequence below the MSTOP; 
this is crucial to break degeneracies with distance and metallicity content (see Fig. 39). 


Globular clusters are (almost, more on this below) single stellar populations of stars (see, e.g., Vandenberg et al., 
1996). It has long been recognized that they are among the most metal poor (~1% of the solar metallicity) stellar 
systems in the Milky Way, and exhibit color-magnitude diagrams characteristic of old (> 10 Gyr) stellar populations 
(O’Malley et al., 2017; Catelan, 2018; Vandenberg et al., 1996). 


Of great interest is the fact that determination of stellar ages in the ’90s provided one of the first hints that the 
dominant cosmological model at the time (an Einstein-de-Sitter Universe) needed revision (see, e.g., Ostriker and 
Steinhardt, 1995; Jimenez et al., 1996a; Spinrad et al., 1997). Old stellar populations were determined to be older 
than 1/Ho, the age of the Universe in that model (see, e.g. Jimenez et al., 1996a). Of course, the age of stellar 
objects at z = 0 is just a lower limit to the age of the Universe and, by itself, does not constrain the cosmological 
model, as changes in Hp and Qm can accommodate an Einstein-de-Sitter Universe. 


In the past, in order to break this degeneracy, a determination of the stellar ages of the oldest galaxies at z >> 0 
proved crucial. This was first achieved by Dunlop et al. (1996). It is revealing to see Fig. 18 in Spinrad et al. 
(1997), which shows the exclusion of the Einstein-de-Sitter Universe once the ages of GCs are taken into account. 
This philosophy has been further developed in the cosmic chronometer method, with the first cosmological-model- 
independent determination of the redshift evolution of the Hubble parameter, H(z) (see Sect. 3.1 and references 
therein). 


The determination of the absolute age of a GC inferred using only the MSTOP luminosity is degenerate with 
other properties of the GC. As already shown in the pioneering work of Haselgrove and Hoyle (1956), the distance 
uncertainty to the GC entails the largest contribution to the error budget: a given % level of relative uncertainty in 
the distance determination involves roughly the same level of uncertainty in the inference of the age. Other sources 
of uncertainty are: the metallicity content, the Helium fraction, the dust absorption (Vandenberg et al., 1996), and 
theoretical systematics regarding the physics and modeling of stellar evolution. 


However, there is more information enclosed in the full-color magnitude diagram of a GC than that enclosed in its 
MSTOP. As first pointed out in Jimenez and Padoan (1996); Padoan and Jimenez (1997), the full color-magnitude 
diagram has features that allow for a joint fit of the distance scale and the age (see Fig. 39 for a visual rendering of 
this). On the one hand, Fig. 2 in Jimenez and Padoan (1998) shows how the different portions of the color-magnitude 
diagram constrain the corresponding physical quantities. Figure 1 in Padoan and Jimenez (1997) and Figure 3 in 
Jimenez and Padoan (1998) show how the luminosity function is not a pure power-law, but has features that contain 
information about the different physical parameters of the GC. This technique enabled the estimation of the ages 
of the GCs M68 (Jimenez and Padoan, 1996), M5 and M55 (Jimenez and Padoan, 1998). Moreover, in principle, 
exploiting the morphology of the horizontal branch makes it possible to determine the ages of GCs independently of 
the distance (Jimenez et al., 1996b). 
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Figure 39: Dependence of the stellar isochrone on variations of age, metallicity and [a/Fe] of the GC with all 
other parameters fixed. Right panels show the relative difference in color. Image reproduced with permission 
from Valcin et al. (2020), copyright by IOP Publishing. 


Further, on the observational front, the gathering of Hubble Space Telescope (HST) photometry for a significant 
sample of galactic GCs has been a game changer. HST has provided very accurate photometry with a very compact 
point spread function, thus easing the problems of crowding when attempting to extract the color-magnitude diagram 
for a GC and making it much easier to control contamination from foreground and background field stars. 
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For these reasons, a precise and robust determination of the age of a GC requires a global fit of all these quantities 
from the full color-magnitude diagram of the cluster. In order to exploit this information, and due to degeneracies 
among GC parameters, a suitable statistical approach is needed. Bayesian techniques, which have recently become 
the workhorse of cosmological parameter inference, are of particular interest. In the perspective of possibly using 
the estimated age of the oldest stellar populations in a cosmological context as a route to constrain the age of the 
Universe, it is of value to adopt Bayesian techniques in this context too. 


There are only a few recent attempts at using Bayesian techniques to fit GCs’ color-magnitude diagrams, albeit 
only using some of their features (see, e.g., Wagner-Kaiser et al., 2017). Other attempts to use Bayesian techniques 
to age-date individual stars from the GAIA catalog can be found in Sahlholdt et al. (2019). A limitation of the 
methodology presented in Wagner-Kaiser et al. (2017) is the large number of parameters needed in the likelihood. 
Actually, for a GC of Ngtars there are, in principle, 4 x Ngtars + 5 model’s parameters (effectively 3 x Nstars + 5), 
where the variables for each star are: initial stellar mass, photometry, ratio of secondary to primary initial stellar 
masses (fixed to 0 in Wagner-Kaiser et al., 2017), and cluster membership indicator. In addition, there are 5 (4) 
additional GC variables, namely: age, metallicity (fixed in the analysis of Wagner-Kaiser et al., 2017), distance 
modulus, absorption, and Helium fraction. For a cluster of 10,000 or more stars, the computational cost of this 
approach is very high. To overcome this issue, Wagner-Kaiser et al. (2017) randomly selected a sub-sample of 3,000 
stars, half above and half below the MSTOP of the cluster, “to ensure a reasonable sample of stars on the sub-giant 
and red-giant branches”. Another difficulty arises from the fact that the cluster membership indicator variable can 
take only the value of 0 or 1 (i.e., whether a star belongs to the cluster or not). This creates a sample of two 
populations referred to as a finite mixture distributions (Wagner-Kaiser et al., 2017). 


Recently, a Bayesian analysis of the GC CMD using all features in it has been carried out by Valcin et al. (2020, 
2021). This has resulted in the join determination of ages, metallicities and distances for 68 GCs observed by the 
HST/ACS project. The main advantage of the Valcin et al. (2020, 2021) approach is that by using all features in 
the CMD diagram it is possible not only to obtain ages with smaller uncertainties, but also remove some of the 
systematic uncertainties (Valcin et al., 2021). 


3.10.2 Sample selection 


To obtain a lower limit to the age of the Universe one needs to select the objects hosting the oldest stars. This in 
itself is an obvious circular argument as we will only know which stars are the oldest after having measured their age. 
The most useful approach is to select those GCs with the lowest metallicity, as they will likely be the first formed in 
the Universe. In reality, since the Milky Way only contains a couple of hundred of GCs, the most natural approach 
would be to just compute the age for all of them and then select. 


This procedure can be also applied to other stellar clusters, like open clusters, but these ones always tend to be 
significantly younger than GCs. 


To measure the ages of stars in GCs the sample selection is fairly straightforward. One selects the stars that 
belong to the globular cluster. The best procedure to do this is to plot individual stars in the color-magnitude 
diagram to identify the locus of cluster members. While there are technicalities involved in computing photometry 
in crowded fields and how to identify cluster members, care need to be taken but these are issues well under-control. 
Indeed, we may already have all data needed as almost all known globular clusters in the Milky Way are known. it 
would be useful to obtained resolved stellar populations of GCs in other galaxies, like Andromeda; this is something 
that JWST may achieve in the near future. The most important revolution will come from using full sky surveys to 
measure ages of stars systematically. For now, suitable observations for a representative sub-sample of 68 GC are 
available (Valcin et al., 2020, 2021). 


3.10.3 Measurements 
Accurate photometry is the main requirement for obtaining color-magnitude diagrams of GCs. In addition, it would 


be desirable to obtain as much spectroscopy as possible from the resolved stars, as this would help reduce the reliance 
on Bayesian priors on metallicity. 
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Of course, good data require outstanding analysis tools. Simply fitting the MSTOP does not do justice to the 
data, as this discards precious information on other parameters besides the age of the GC. The recent use of fully 
Bayesian techniques (like, e.g., in Valcin et al., 2020) shows that there is more information in the CMD. Future uses 
of likelihood-free inference can further extract all information from the CMD. 


3.10.4 Systematic effects 


Systematics are the main source of uncertainties when obtaining the absolute age of GCs; note, however, that relative 
ages are less prone to systematic uncertainties. Systematic uncertainties are dominated by uncertainties in stellar 
evolution and distance determination to the GC. The best study of systematic uncertainties in the age determination 
of GCs is the work by Chaboyer and collaborators (see, e.g., O’ Malley et al., 2017). They are mostly arising from 
three sources: uncertainties in i) nuclear reaction rates, ii)in the modeling of convection in the outer layers of low 
mass stars, and iti) in the estimation of the distance to the GC. These are the same systematic uncertainties that 
affect the age determination obtained using only the MSTOP, but are ameliorated when using the full morphology of 
the color-magnitude diagram (see Valcin et al., 2021). In particular, both distance and uncertainties in convection of 
the star’s outer layers can be significantly reduced when using the full CMD. In addition, independent measurements 
obtained by the Gaia space mission will drastically reduce the uncertainty on distance. It is worth noting that the 
uncertainties concerning stellar nuclear rates could be greatly reduced by producing better theoretical computations 
(see also Boylan-Kolchin and Weisz, 2021). 


Another source of (systematic) error is the uncertainty in Zform to infer the the age of the Universe from the 
age of the star; this was addressed in detail in Jimenez et al. (2019). The determination of zform will be improved 
dramatically by the upcoming observations from JWST (Gardner et al., 2006) which will conclusively map the mass 
function of objects at 10 < z < 20. 
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Figure 40: Probability distribution for the age of the Universe obtained using stellar ages (thin set of lines) and 
derived by Planck18 from the CMB assuming the ACDM model (thick solid, Planck Collaboration et al., 2020a). 
All determinations are in good agreement. Just as an example of what kind of accuracy could be obtained if 
systematic uncertainties were all under control, the inset shows the age of the Universe for the formal determination 
and formal uncertainty of J18082002-5104378, which is fully compatible with Planck18. The formal GCs ages for 
69 ACS clusters from O’Malley et al. (2017) would look similar to the J18082002-5104378 line. Image reproduced 
with permission from Jimenez et al. (2019), copyright by IOP Publishing. 
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Figure 41: Age distribution for globular clusters using the Bayesian method of Valcin et al. (2020) when using 
the full CMD with different metallicity cuts. The behavior is consistent with the expected age-metallicity relation. 
Only the statistical uncertainty is displayed. An additional uncertainty of 0.25 Gyr (Valcin et al., 2021) at 68% 
confidence level needs to be added to account for the systematic uncertainty. Image reproduced with permission 
from Valcin et al. (2020), copyright by IOP Publishing. 


3.10.5 Main results and forecasts 


The most recent determinations of the ages of GCs using the MSTOP and full CMD Bayesian method are shown 
in Fig. 40 and 41. In Fig. 40, taken from Jimenez et al. (2019) ages determinations from the literature were used, 
including the ages of individual stars,and CMB-derived age. In Fig. 41 the age of the Universe is computed using 
the method of Valcin et al. (2020). Despite the very different observables and approaches, there is good agreement 
among all the age determinations. 

Stellar ages have proven to be also extremely useful to unveil the nature of the Hubble tension, as done in Vagnozzi 
et al. (2021). A summary of how the absolute age ty determination can weigh in on the current “debate” on the 
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Figure 42: Triad plot “new cosmic triangle”: 68% confidence level marginalized constraints on the new cosmic 
triangle: the triad corresponding to the age of the Universe and the Hubble constant (upper left) is shown. Note 
that all points in the figure sum up to 0, while the ticks in the axes determine the direction of equal values for each 
axis. Note that absolute ages of GCs are consistent with the model-dependent Planck18 value. Image reproduced 
with permission from Bernal et al. (2021), copyright by Physical Review D. 
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expansion rate of the Universe is shown in Fig. 42, and further elaborated in Bernal et al. (2021). There are two 
independent physical quantities (Ho, ty), but three quantities are measured independently: ty from absolute ages, Ho 
from cosmic distance ladder, and, ina ACDM model, Hoty from standard rules and candles (BAO and SNe Ia), anda 
combination of constraints on ty and Hp from CMB. This is an over-constrained system which can be be represented 
on a triad plot (see Bernal et al. (2021), referred to as “new cosmic triangle”) such that log ty + logHo = log(H0ty), 
Fig. 42. While BAO, SNe Ia and CMB inferences depend on the cosmological model, the stellar ages ones (and 
the distance ladder ones) are cosmology-independent. It is interesting to note that GC ages, CMB, and BAO+SNe 
determinations agree, indicating that a ACDM-like expansion history is a good fit to the data, in the redshift window 
heavily weighed by these data. 


The future for more accurate GC ages lies in two fronts: reducing the systematic uncertainty in stellar modeling 
by constructing improved models and reducing the width of priors used for metallicity and distances by resorting to 
additional, complementary observations. Direct distances from Gaia (Gaia Collaboration et al., 2016) are particularly 
promising. Especially useful will be the final Gaia data release which will provide % or sub-% direct (parallax based) 
distances to GCs. This will tremendously narrow the adopted distance prior range. Another important ingredient 
will be the direct spectroscopic determination of chemical abundances in individual stars in GCs, specially below the 
MSTOP. The JWST telescope (Gardner et al., 2006) will enable enormous progress in these two directions. If these 
two priors are constrained at the % level from direct observations, then the only remaining systematic uncertainty 
will be that from constructing the stellar models. As shown in Valcin et al. (2021), when using the full CMD, the 
dominant uncertainty left in stellar models is the one due to nuclear reaction rates which can in principle be improved 
by a combination of laboratory and theory efforts. 


107 


3.11 Secular Redshift Drift 


Any non-empty universe will exhibit an accelerating or decelerating Hubble expansion, which can be observed as a 
secular redshift drift. Sandage (1962) first proposed observing this effect in the optical spectra of galaxies to measure 
the cosmic deceleration. Loeb (1998) later suggested using the neutral hydrogen Lyman a forest of absorption lines 
toward quasars, and this concept has been developed as a key science case for large optical telescopes (e.g., Corasaniti 
et al., 2007; Liske et al., 2008). Large radio telescopes may likewise probe the redshift drift using neutral hydrogen 
via the 21cm emission line from galaxy surveys or using HI 21cm absorption toward quasars (e.g., Darling, 2012; Yu 
et al., 2014; Kloeckner et al., 2015). Measurements require exquisite, repeatable, long-term wavelength calibration 
that will most likely rely on a stable local oscillator in both the optical and radio wavelength regimes. 


The secular redshift drift is a means to directly observe the cosmic acceleration that does not rely on models, 
standard candles, standard rulers, or the cosmological distance ladder. It is capable of directly testing standard dark 
energy cosmology and can be used as a probe of cosmological inhomogeneities and thus test the FLRW paradigm and 
general anisotropic models (e.g., Quartin and Amendola, 2010). However, the signal is so small (of order Hp At, where 
At is the duration of observation) that it is unlikely to provide competitive constraints on cosmological parameters 
in an era of precision cosmology. For example, Alves et al. (2019) predict that a 40m-class optical telescope Lya 
forest program combined with an HI 21cm emission line survey and HI 21cm absorption line monitoring can provide 
independent constraints on Ho, Qm, and wo of order 19%, 7%, and 13%, respectively, in a flat wCDM model 
(marginalized lo uncertainties). 


Nevertheless, a measurement of ż is a model-independent indication of the presence of dark energy, and offers a 
means to directly determine the cosmic expansion history. It also offers some improvement on cosmological priors 
when combined with more traditional measurements (Alves et al., 2019), and notably tends to break parameter 
degeneracies in traditional comsological probes (Martins et al., 2021). 


In the following, we describe the expected secular redshift drift, its dependence on cosmological parameters, 
measurement methods including sample selection and systematic effects, and forecasts of the measurement precision 
and the resulting constraints on cosmological parameters. 


3.11.1 Basic idea and equations 


The observed secular redshift drift, the rate of change of redshift in the current epoch to, is to first order the difference 
between the Hubble expansion of a coasting universe at redshift z and the true Hubble expansion at that redshift 
(e.g., Loeb, 1998): 


— = ż = (1+ z) Ho — H (2) . (131) 


The derivation of this relationship relies only on the null interval obeying cdt = a(t) dr and the definitions 1 + z = 
a(to)/a(te) and H = a/a: for redshifts measured at times tọ and to + Ato, the redshift change is 


Ae a(to + At) a(to) a(to) a(to) a(te) 
z= ~ 

aļ(te + Ate)  a(te) a(te) a(to) alte) 
for Ato < to. The redshift drift can be recast in terms of an observed acceleration: 


dv Cz E(z) 
mri 7-7) a 


| Ato (132) 


where F(z) is the unitless rescaled Hubble parameter (Eq. 11) that depends on the contents and curvature of the 
universe. Measurements of the secular redshift drift thus encode the Hubble constant, the matter density, the 
curvature, and the dark energy density and its equation of state. Alves et al. (2019) show that the redshift drift is 
most sensitive to Hp and Qm (or Qa) in a canonical flat ACDM cosmology. In wCDM or wowaCDM models, the 
effect is less sensitive to wo and least sensitive to Wa (but these broad statements vary somewhat as a function of 
redshift and the span of redshifts explored by a given probe). 


Figure 43 shows sample tracks of z and ù versus redshift for a few cosmologies as well as their differences from 
a fiducial model. There are a few noteworthy features of the redshift drift: i) the redshifts of the peak 2, the peak 
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Figure 43: Secular redshift drift (left) and apparent acceleration (right) versus redshift. All loci assume Ho=74 
km s~ Mpc}, Qm=0.27, Qa=0.73, and wo = —1 unless otherwise indicated. The top row shows the full signal, 
and the bottom row shows the difference between several models and the fiducial cosmology. 


acceleration, and the null between acceleration and deceleration are all independent of Ho, but ii) the amplitude of 
the peaks (and the amplitude of the curves generally) scale with Hp. The redshifts of the peaks and the null depend 
sensitively on the energy densities, including the curvature, but are somewhat insensitive to wo and wg when these 
are close to the canonical values. For example, the z = 0 redshift varies by roughly z = 2.50.5 for Qm= 0.27 + 0.03 
in a flat ACDM cosmology. 


Measurements of 2 at a variety of redshifts can thus probe epochs of acceleration caused by dark energy (z < 2.5) 
as well as epochs of deceleration caused by matter (z 2 2.5). This measurement is challenging because the size of the 
acceleration is small: it reaches a peak value of roughly 0.4 cm s~! yr~ at z œ 0.76. The peak in 2 is ~2x1071! yr~1 
(or ~ Ho /3) at z œ 1.1, as shown in Fig. 43. Provided one can achieve adequate precision and measurement stability 
over years to decades, nearly any redshift indicator can be used to measure the secular redshift drift, including 
spectral lines (emission or absorption) and spectral edges or continuum breaks. 


Since the most accessible measurements rely on spectral line centroiding, high signal-to-noise observations of 
many narrow lines are required, and narrow lines tend to be absorption lines (we exclude astrophysical masers from 
consideration). The technique therefore favors reasonable optical depth (but unsaturated) absorption lines toward 
bright optical or radio continuum sources. The Lyman a forest provides a high-N, high-o per line regime while 
radio absorption lines provide (for now) low-N, low-c measurements. The two regimes are likely to be competitive 
in the long-run, although the Lya forest method will likely be less susceptible to and be better able to average out 
gravitational accelerations caused by the local environment and large scale structure (see Sect. 3.11.4). 


Hı 21cm radio emission from galaxies has also been proposed as a means to measure Z using large galaxy surveys 
(Kloeckner et al., 2015). This approach relies on large samples, ~107 galaxies per measurement, in order to overcome 
the large line width that samples the full rotation curve of galaxies, the large expected internal accelerations, and 
the acceleration caused by large-scale structures (as will be discussed in Sect. 3.11.4). 


3.11.2 Sample selection 
There exist three main methods for detecting the secular redshift drift: (1) Lyman a forest absorption lines toward 


bright quasars, (2) Hı 21cm emission from galaxies, and (3) Hr 21cm absorption toward radio sources. There are 
additional methods beyond these that have not been as well developed such as molecular absorption lines toward 
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bright (sub)mm continuum sources. It is also certain that additional clever ideas will arise (see especially Kim et al., 
2015) as the notion of directly measuring the cosmic acceleration gains traction and becomes more realistic with new 
facilities. 


Bright quasars are needed to maximize signal-to-noise in high-resolution spectra of the Lya forest. Quasars must 
also be redshifted to place Lya redward of the atmospheric UV cutoff. To maximize spectral coverage per observation, 
optimal quasars would have z ~ 5 and be as bright as possible. The number of monitored quasars does not need to 
be large because the large-N statistics arise from the hundreds of absorption lines seen along each sight-line (e.g., 
Liske et al., 2008). 


Hı 21cm emission line surveys rely on areal coverage and redshift selection. Redshift selection for a fiducial Square 
Kilometer Array (SKA) survey is flux-limited, and the ability to measure the redshift drift is limited by the number 
of detected galaxies and their signal-to-noise. Typically, ~10’ galaxies need to be observed within a redshift bin, 
and Kloeckner et al. (2015) predict that z can be measured up to z ~ 1. 


At present, there are only ~140 Hr 21cm absorption line systems known, which is a consequence of limited surveys, 
limited bandwidths, radio frequency interference (RFI), and flux sensitivity (absorption systems are generally only 
detected toward Jy-level continuum radio sources at ~1 GHz). As areal coverage and sensitivity of surveys increase 
with SKA prototypes, the Five-hundred-meter Aperture Spherical radio Telescope (FAST), Canadian Hydrogen 
Intensity Mapping Experiment (CHIME), and ultimately the full SKA, the number of known systems is expected to 
increase by more than an order of magnitude. 


Most planned or current surveys expect to detect at least hundreds of new HI 21cm absorption line systems. 
For example, the ASKAP FLASH survey expects to detect several hundred new 21cm absorption line systems at 
z <1 (Sadler et al., 2020). Jiao et al. (2020) describe a commensal FAST survey that is predicted to detect roughly 
800, 1900, and 2600 Hı 21cm absorption systems with z < 0.37 in 1, 5, and 10-year surveys, whereas Zhang et al. 
(2021) predict more than 1500 absorbers would be detected at z < 0.37 by FAST. CHIME, however, will survey the 
northern sky continuously and is predicted to detect ~10° absorption lines in 0.8 < z < 2.5 (Yu et al., 2014). 


3.11.3 Measurements 


Here we focus on the expected precision obtained by redshift drift measurements (forecasts for cosmological param- 
eters based on the following predicted measurements are described in Sect. 3.11.5). Figure 44 depicts the following 
predictions: 


1. Following Liske et al. (2008) Eq. 16, we predict measurements based on a generic 42 m ELT. In the figure, 
we assume a two-epoch Lya forest monitoring program of 10 quasars with S/N of 3000 spanning 20 years. 
Such a program is expected to reach acceleration uncertainties of 0.22-0.08 cm s~! yr~! over redshifts z =2-5. 
It may be possible to improve upon this prediction using absorption lines beyond Lya, such as other Lyman 
series lines or metal lines that arise from higher column density clouds (Liske et al., 2008). Moreover, Cooke 
(2020) presented a “Lya cell” calibration technique that uses relative accelerations of metal and Lya forest lines 
to provide a larger lever arm on the signal and to allow internal wavelength calibration of spectra. Finally, 
Eikenberry et al. (2019b,a) proposed a dedicated non-ELT facility comprising many small telescopes that could 
reduce the detection time to 5 years. 


2. The full SKA, following Kloeckner et al. (2015) (see also Martins et al., 2016), is predicted to use 21cm emission 
from galaxies to measure 2 with 1-10% uncertainty over redshifts z =0.1-1.0 in two epochs spanning 12 years. 
Galaxy-scale emission line profiles are broad (100’s of km s~!, modulo inclination), which translates into a 
factor of ~1000 in sample size needed to roughly match absorption line centroiding, all else equal. We suggest 
that emission line edges and object-by-object cross-correlation may improve the expected performance of this 
technique but that the sensitivity of this technique to z needs to be modeled in detail using observed 21 cm 
emission line profiles. 


3. Provided the expected populations of Ht 21cm absorption line systems are detected by FAST and SKA pre- 
cursors (as discussed in Sect. 3.11.2), we can modify the Darling (2012) predictions to make new estimates of 
the redshift drift measurement. A 20-year FAST monitoring program of 1000 absorption lines at z < 0.37 will 
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Figure 44: Forecast acceleration measurements versus redshift for the SKA using Hı 21cm emission from galaxies 
(Kloeckner et al., 2015), CHIME using Hı 21cm absorption (Yu et al., 2014), Hı 21cm absorption using FAST and 
the SKA estimated from projected detections (see text), and an ELT program that monitors the Lya forest (Liske 
et al., 2008). The cosmological tracks follow Figure 43. The shaded loci and error bars indicate lo uncertainties. 


obtain acceleration precision of roughly +0.08 cm s~! yr~!. Likewise, a 10-year SKA program observing two 
redshift bins at z = 0.55 and z = 0.85 with 500 lines each can reach rms acceleration noise of ~0.08 cm s~! yr7!, 


which is similar to the expectation for 21cm emission. 


Yu et al. (2014) predict that CHIME can reach 0.08-0.14 cm s~+ yr~! uncertainties spanning the range z = 0.8- 
2.5 in a 10-year survey. The key differences between CHIME and FAST or SKA programs are the 100-fold higher 
number of expected absorption line systems and the daily observation of every system over 10 years. If absorption 
line systems are detected at the predicted rate, this suggests that CHIME will be competitive with two- or few-epoch 
surveys of ~10° systems that require much larger collecting areas. 


Figure 44 shows the measurement forecasts and illustrates how the signal can be detected but cannot generally 
discriminate between cosmologies that are consistent with current paradigms. They can, however definitively and 
directly demonstrate the influence of dark energy on the cosmic expansion without use of standard distance indicators 
or models. 


3.11.4 Systematic effects 


Systematic effects include the ability to obtain stable and repeatable wavelength or frequency calibration, the relative 
angular motions of absorbing gas with respect to illumination sources, illumination source variability in size, flux, 
and spectral properties, motion of the observer, peculiar velocity and accelerations, and gravitational accelerations 
internal to and between monitored objects. Observations are made from a very non-inertial reference frame that 
reflects multiple accelerations and rotations, although these will be well-determined in the near future to better 
precision than is needed for the 2 measurement. 


The requisite calibration stability relies on a local oscillator, and current radio facilities already support this 
level of precision (e.g., Cooke, 2020). Optical spectroscopy will require stable references such as laser combs and 
actively-controlled high-precision spectrographs (e.g., Eikenberry et al., 2019b,a). 

Gravitational accelerations within galaxies, between galaxies, and within galaxy clusters are of order 1 cms~! yr7!. 
For example, the barycenter acceleration due to its orbit within the Galaxy is ~0.7 cm s7} yr~! (e.g., Titov et al., 
2011; Charlot et al., 2020; Gaia Collaboration et al., 2021), which is larger than the peak cosmological acceleration. 
The 2 signal, however, has a well-defined sign at low and high redshifts (away from the null value), while gravitational 
accelerations will be randomly distributed and null-centered. The net effect of peculiar accelerations will therefore be 
added noise, which may drive up sample sizes, integration times, and program duration. Gravitational accelerations 


will be largest for 21cm emission and absorption lines. 
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Figure 45: Forecast cosmological parameter constraints using the combined secular redshift drift measurements 
presented in Figure 44 for a flat ACDM model (left) and a flat CPL model (right). 


Hı 21cm absorption lines can be intrinsic to the host of the illumination source or intervening between the 
illumination and the observer, but are generally going to have column densities associated with damped Lya systems 
and therefore associated with galaxies rather than intergalactic clouds. Peculiar accelerations are of larger concern 
in these systems than in the Lya forest (Cooke, 2020), particularly in light of the comparatively smaller number 
of clouds that will be used for the measurements, except for CHIME (if the expected absorption line population is 
realized). 


Loeb (1998) and Liske et al. (2008) explored the impact of peculiar acceleration on the Lya forest and found that 
it is significantly smaller than the cosmological signal. Cooke (2020) used hydrodynamical simulations to calculate 
peculiar accelerations in the Lya forest and in gas in galaxies and founds that the Lya forest peculiar accelerations 
are much smaller than the cosmological signal except at the z zero-crossing region. Gas in galaxies, however, shows 
accelerations of the same order of magnitude up to 2 dex higher than the redshift drift, which supports the concern 
about systematic effects in 21 cm measurements. 


3.11.5 Main results and forecasts 


Secular redshift drift measurements on their own will not compete with other “precision cosmology” probes in terms of 
per cent-level constraints on cosmological parameters. However, the method does offer a model-independent method 
to directly detect the cosmic acceleration that does not rely on standard candles, standard rulers, or the cosmic 
distance ladder, and therefore has completely different systematics from canonical cosmological probes. It is also a 
powerful probe of isotropy and the general FLRW model (Quartin and Amendola, 2010). 


Using the combined data and uncertainties for all methods shown in Fig. 44, we run an MCMC analysis to 
forecast constraints on the parameters of three different cosmological models: (1) a flat ACDM model (with two free 
parameters, Hoand Qm), (2) a geometrically unconstrained ACDM model (where also Qais free to vary), and (3) a 
flat woWaCDM model (with four free parameters, namely Ho, Qm, wo, and wa). The fiducial parameter values used 
for the forecasts are Hy=74 km s~!Mpc~!, Qm=0.27, Q,=0.73, wo = —1, and wa = 0. The less constrained models 
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show the largest uncertainties in large part due to strong correlations between parameters. The best-constrained 
cosmology is the flat ACDM model, which provides uncertainties on Hp and Nm of +2%. The unconstrained ACDM 
model has uncertainties in Ho, Qm, and Q, of ~40% and are highly degenerate. Finally, the flat wowaCDM model 
shows a mixed picture with uncertainties of 17% in Ho, 8% in Qm, £0.1 in wo, and 0.3 in wa, with strong correlation 
between all parameters. In analyses comparing various redshift drift measurement methods, Martins et al. (2021) 
and Esteves et al. (2021) show that there is no “best” method and caution that the choice of measurement should be 
tailored to specific science goals (e.g., constraining Q,,versus the dark energy equation of state). 


The correlation between parameters measured by the secular redshift drift suggests that this method would benefit 
from joint analyses with other cosmological probes (non-standard and otherwise, see Sect. 4). For example, Alves 
et al. (2019) combine the expected ELT, SKA 21cm emission, and CHIME measurements to make individual and 
joint forecasts for flat ACDM, wCDM, and wowgCDM cosmologies, both with and without priors. When current or 
future expected priors are included, cosmological parameter constraints of ~1% can be obtained. Moreover, Martins 
et al. (2021) show that the redshift drift can break parameter degeneracies in traditional cosmological probes. 


The larger impact of the secular redshift drift measurement is its ability to unambiguously and directly identify 
the influence of dark energy on the Hubble expansion. This statement applies individually for any of the measurement 
methods described above, including the Lya forest technique that would only measure deceleration: the amplitude of 
2 changes dramatically in the absence of dark energy. Any method that can measure a non-zero cosmic acceleration 
can differentiate between cosmologies with and without dark energy (as shown in Figs. 43 and 44). 
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3.12 Clustering of Standard Candles 


For over a decade after the seminal 1998 papers (Perlmutter et al., 1998, 1999; Riess et al., 1998) SNe Ia have been 
one of the most important observables in cosmology. Their prominence as a probe of the background cosmology has 
more recently been shadowed by the large increase in the available data of both the CMB and of BAO in large-scale 
structure. There are, however, two reasons why supernovae could return to the forefront of cosmology. First, the 
Vera Rubin Observatory Legacy Survey of Space and Time (LSST, LSST Science Collaboration et al., 2009) should 
increase the available number of events by at least two orders of magnitude. Second, supernovae are also able to 
probe cosmology beyond the background level. 


There have been two approaches to extract information on linear perturbation parameters from supernovae. 
First, they can be used as probes of gravitational lensing. They can in fact be used both in the weak (Quartin 
et al., 2014; Castro and Quartin, 2014; Scovacricchi et al., 2017; Macaulay et al., 2017, 2020) and strong lensing 
regimes (Zumalacarregui and Seljak, 2018; Grillo et al., 2018, 2020). The main observable is the induced change in 
their scatter at a given redshift. The second approach is to measure the correlations between supernova magnitudes 
induced by the peculiar velocity field. This field can be computed to good precision in linear perturbation theory 
and is correlated to the density contrast (Hui and Greene, 2006a). Measurements of these correlations have been 
more recently explored in detail in a number of papers (Castro et al., 2016; Howlett et al., 2017; Garcia et al., 2020; 
Amendola and Quartin, 2021; Graziani et al., 2020). 


Here, we review this latter approach and the forecasts performed for future survey. The advantages of peculiar 
velocity measurements is that they are well described by linear perturbation theory and both velocity and density 
tracers have different degeneracies with the linear bias, making them very complementary. 


3.12.1 Basic idea and equations 


The first measurement of peculiar velocity correlation in real supernova data was carried out by Gordon et al. (2007) 
using 271 SNe, arriving at this measurement: og = 0.79 + 0.22. Castro et al. (2016) proposed a more thorough 
methodology to extract peculiar velocity information from supernova data. Combining the SN velocity and SN 


lensing observables in the JLA supernova catalog (Betoule et al., 2014), a joint measurement of og = ORs ake and 


the growth rate index y = Lae ie was obtained from SN data alone. This included marginalization over 8 nuisance 
parameters for both lensing and peculiar velocities and other 4 cosmological parameters. Fixing y instead resulted 
in a tighter constraint, with og = a0") It was also shown by Castro et al. (2016) that SN lensing and velocities 
constraints were very complementary, with degeneracy directions differing by 60° in the og, y plane. It was shown 
that both SN lensing and velocities were also very complementary to the CMB growth of structure constraints. 
A measurement of fog at low redshifts, where the dependence on cosmology is weak, was also obtained with SN 


velocities by Huterer et al. (2017); Boruah et al. (2020). 


These measurements of the velocity power spectrum can be combined to great gain with measurements of the 
density power spectrum and the density-velocity cross-spectrum. This was first proposed by Howlett et al. (2017) 
(henceforth H17), which performed Fisher Matrix forecasts for measuring fog combining density and velocity spectra. 
The former measured with galaxies, the latter with SN. Similar forecasts were also performed by Palmese and Kim 
(2021) combining future standard siren and galaxy survey data. 


The above promising results prompted a study of the capabilities of Rubin to perform measurements of the velocity 
power spectrum with SNe. Garcia et al. (2020) investigated the constraints that could be achieved with Rubin using 
the official survey strategy under investigation by the collaboration at the time. As was known, that strategy was 
not optimal for SN science, and the inferred SN completeness using the SNANA code (Kessler et al., 2009) and the 
proposed quality cuts was very low both at low z and for z > 0.5. Figure 46 illustrates this result (dubbed LSST 
Status Quo, or LSST SQ in short), as well as the assumed completeness by a few other recent works. Nevertheless, 
even without further refinements, this was already enough to achieve very interesting velocity measurements with 
Rubin. It was also shown in that paper that the velocity constraints in the og, y plane exhibit moderate non- 
Gaussianity (they are banana-shaped, instead of ellipsoidal), and thus the Fisher Matrix forecasts on the errors were 
not very accurate. Since the combination of velocity and density spectra makes for much tighter constraints than 
using velocity alone, the Fisher Matrix results for the combined cases is expected be a good approximation of the 
full likelihood results. 


114 


fas ara 3 (—— ZTF max 30") | 
be \ 2| 
[ \ aa " 
2 0.8L \ \ ZTF max (120") 
® [ \ i —— Q21 Conserv. 
3 0.6- | \ —— Q21 Aggressive |- 
2 f \ : — H17 
Q 0.4 A20, G20 
të] F ES , 
Z _ | 
nD 02 LSST SQ 
oo o o o >s 
0.0 0.2 0.4 0.6 0.8 
Z 


Figure 46: Comparison of assumed SN completeness in forecasts. In dashed lines represent the maximum ZTF 
theoretical completeness using the limiting magnitude in the deepest filter for the standard 30’ exposure time 
and also for a possible 120’ exposure. The horizontal solid lines represent assumptions made different works. For 
Rubin we also show results from the survey strategy as of 2019, which was obtained after applying the proposed 
photometric quality cuts for a 5-year survey (LSST SQ, for Status Quo), which greatly reduces the completeness. 


Garcia et al. (2020) also investigated how to improve the observing strategy, and found that the same observing 
time provides the similar cosmological information whether one observes a larger area, or a smaller area during 
more years. In fact, it was shown that even with optimistic Rubin SN numbers, the SN velocity spectrum is still 
observed far from the Cosmic Variance regime, and for a broad range of SN number densities ns the uncertainties 
still scale as ns o = which is the same power with which uncertainties generally scale with the survey area. This 
means that in terms of SN clustering, the most important feature is to have a high cadence in order to achieve higher 
SN completeness. Lochner et al. (2021) recently revisited the impact of different survey strategies on SN velocity 
measurements. 


Figure 46 also illustrates that the Zwicky Transient Facility (ZTF) would in principle be capable of observing a 
catalog of SN with high completeness for z < 0.3. In fact, recently a first measurement of the clustering of both core 
collapse and type Ia SN was performed by ZTF (Tsaprazi et al., 2021). 


The combined measurements of velocity and density spectra were also studied in Amendola and Quartin (2021) 
(henceforth A21) where a model-independent methodology was proposed to extract competitive constraints in E(z) 
without almost any assumption regarding the cosmological model in any stage of the analysis. It was shown, using 
SN only both as density and velocity tracers, that it was possible to achieve 5-13% (9-40%) measurements in redshift 
bins of Az = 0.1 up to at least z = 0.6. These results included marginalization over a large number of bias parameters, 
which were allowed to vary freely in both z and k. It was also discussed that using SN one cannot however measure 
Ho with this method. Moreover the constrains on F(z) blow up in the limit z — 0. 


Quartin et al. (2021) (henceforth Q21) recently proposed further to analyze galaxy and supernova data in a 
more exhaustive way by using SN both as density and velocity tracers. This combines the complementarity of the 
velocity measurements with the benefits of a multi-tracer analysis (see, e.g. Seljak, 2009; McDonald and Seljak, 2009; 
Abramo, 2011; Abramo and Leonard, 2013). Here, instead of different galaxy populations, the multiple tracers are 
galaxies and supernovae. This leads to 6 different power spectra (3 auto and 3 cross spectra), and thus it was dubbed 
the 6 x 2pt method. This was shown to increase the precision with respect to the 3 x 2pt methods studied in both 
H17 and Amendola and Quartin (2021), at no cost in terms of extra data being needed. This extra precision was 
achieved not only in the cosmological parameters, but also in the bias parameters, making this approach more robust 
to uncertainties in the galaxy bias. 


One should note that in general velocity tracers inhabit galaxies. This means that we can only observe the 
velocity fields where there are galaxies. This means that we observe a mass-weighted velocity field, also referred 
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to as the momentum field (Howlett (2019)): p(r) = v(r)(1 + 4,(r)). At larger scales both momentum and velocity 
field coincide, but already at scales of ~0.1 h/Mpc the former picks up non-linearities arising from quadratic terms. 
Nevertheless, these can be modeled using perturbation theory in a straightforward manner, so we will neglect them 
here for simplicity. 


Let us denote with ôm the density contrast of matter, and with ôr = brôm the density contrast of a tracer field 
of sources (subscript T) that are or can be standardized, e.g. SNe Ia, where br is the bias, in general dependent in 
an unknown way on space and time. In the linear regime and in Fourier space, we know that, due to the continuity 
equation, the peculiar velocity field v and the matter density contrast of a tracer field are related by: 


k 
= iHb— ô 134 
vr=i PR +2) T (134) 
where 8 = f/br, and f = dlog ôm/dloga being the growth rate. The only component of the velocity field that is 
observable is, however, the longitudinal velocity vr = v -r/r (although see Hotinli et al., 2019), so the relation 
becomes: H a # 
; r f u 
"eo eo a re) 


where u = cos Îk,r is the angle between k and the line of sight r. From this expression we see that, if we can measure 
both ôr and vj, we have access to the combination of H8, assuming that we also know k, u (and of course the redshift 
z). So in order to measure H(z) we need to measure {: this can be estimated through the redshift distortion of 
the galaxy power spectrum. However, we also need to be able to convert the raw data of redshift and angles into k 
and u. To solve this problem, we will make use of the fact that k, u depend on the observables (redshift and angles) 
through the angular diameter distance Da and through H(z) itself. We assume the Etherington relation between 
the luminosity distance Dy, and the angular diameter distances is valid, so that Du = Da(1+ z). We also assume 
that Dy is measured directly from the standard candles, while Hp is given by local measurements, so that we know 
the combination HoD. Although we could include the error on the estimation of HoD; in our formalism, we will 
show at the end that it is way below the other uncertainties, so we may neglect it. 


A peculiar velocity v (in units of c) induces a change in the luminosity distance Dy given by (Hui and Greene, 
2006b): 


ôD | dlog Dy | (136) 


Dy dlog(1+ z) 
Since m = M + 25 + 5log DL, a small change in Dy, induces a change ôm in the apparent magnitude given by: 


dD,  log10 
= ôm 


137 
E = Bien, (137) 


so that finally the radial peculiar velocity of a standard candle is obtained from the scatter dm of its apparent 
magnitude as: 


5 dlog(1 + z) a 


y glo, 2 dlog Dy | 

Let us now consider three Gaussian fields in Fourier space with zero mean: the density contrast ôs of the standard 
candles, their peculiar velocity field vs, and the galaxy density contrast ðs. A fraction of the supernovae could be 
hosted by one of the galaxies in the sample, but we expect this fraction to be small. We consider the same growth 
rate index f for every tracer field, which equates to assuming universal gravity. We also introduce the linear bias 
for each species, bg s = Ôg s/Ôtot, Where dtot is the underlying total matter density contrast. The functions bg, s are in 
general arbitrary functions of space and time. Following Quartin et al. (2021) we can write the six observed power 
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spectra as: 
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Pas(k, u, 2) = T [1 + Beu?] [1 + Bop] be bs Sg Ss D? Pram (k) + n ; (141) 
gills 
Ay 2 2 
Pay (k, M,Z) = asa" + gh |b. Sg Sy f D4 Pmm(k) , (142) 
Ay 2 2 
Polk u, z) = ta" + Bsp’ ]bs Ss Sy f DË Paul): , (143) 
+2 
Hyu 2 £2 p2 Tef 
Palk, u, 2) = Y |] S2 f? D2 Punmlk) + ZF , 144 
(oma) =T E] $24 DY Panle) + E (144) 


where 6; = f/bi, p = k- Fr, Sgv,s are damping terms and Pmm is the matter power spectrum at z = 0. 


All observed spectra are multiplied by a volume-correcting factor Y (Ballinger et al., 1996; Seo and Eisenstein, 
2003), where: 
H D}, 


_ 14 
ae (145) 


because we need first to choose a reference cosmology, e.g. ACDM (subscript r), and then correct for any other 
cosmology. For the same reason, the AP effect (Alcock and Paczynski, 1979), which introduces corrections to k and 
u (see, e.g., Magira et al., 2000; Amendola et al., 2005), has also been taken into account. 


The non-linear smoothing factors Sy g,s (important only at small scales) can be taken following Koda et al. (2014); 
Howlett et al. (2017) to be: 


1 
Syg,8 = exp |- trove . (146) 
In this expression, Cy, g,s are assumed to be independent of redshift. Q21 set as fiducial values og = os = 4.24 Mpc/h 


and oy = 8.5 Mpc/h. H17 used very similar values. These fiducial values nevertheless have little impact in the 
forecasts. Finally, the noise term in the velocity correlation is given by (Hui and Greene, 2006b; Davis et al., 2011): 


log 10 ? dlog Dy, |7? oĉ nonii 
2 = ele, i 2 v, nonin 147 
Ov,eff | 5 7 e| | dlog(1 + z) T c2 i aan) 


where Gint is the intrinsic variance of the source’s magnitude. 


The 6 x 2pt results in a 3 x 3 matrix of correlation: 


Psg Pes Pev 
C= Pes Pe Pyy 3 (148) 
Pox Paw Pev 


The probability distribution of our random variables, i.e. £a = VV {6g,5s, Us}, is assumed Gaussian with zero mean 
and covariance matrix given by C. The Fisher matrix associated to the unknown parameters is thus (Abramo and 
Amendola, 2019): 

Fas = VV , (149) 


where Vp = (27)~32rk?A,, is a volume element in Fourier space and Fag is: 


z Lf Car 400.4 
Fig == i Gt 1 
ap i H Oo, Cad dg Che G ( 50) 
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Q21 Conservative Q21 Aggressive H17 All Rubin SNe 
Zbin V 10? - ns be kmax V 10? - ns be kmax V 10? - ng be kmax 
[Gpc/h]? [h/Mpc]? h/Mpc|[Gpc/h]? [h/Mpc]’ h/Mpc|[Gpc/h]? _[h/Mpc]? h/Mpe 

0.05) 0.019 0.048 1.38 0.1 0.046 0.096 1.38 0.1 0.046 0.143 1.38 0.2 
0.15} 0.123 0.052 1.45 0.1 0.296 0.105 1.45 0.1 0.296 0.157 1.45 0.2 
0.25] 0.303 0.057 1.53 0.1 0.727 0.114 1.53 0.1 0.727 0.172 1.53 0.2 
0.35} 0.531 0.061 2.04 0.1 1.27 0.122 2.04 0.1 1.27 0.186 1.61 0.2 
0.45 _ = = = 1.88 0.131 2.15 0.1 1.88 0.200 1.69 0.2 
0.55 = = = z 2.51 0.139 2.26 0.1 = = = = 

0.65 = z = = 3.13 0.148 2.37 0.1 = 7 = = 


Table 6: Survey specifications for the proposed forecast scenarios in H17 and Q21. The assumed SN completeness 
in each case is represented in Figure 46. The supernova bias is assumed to be 1.0/D+(z); the galaxy bias is assumed 
to be 1.34/D4(z) for z < 0.3 (mostly BGs), 1.7/D+(z) for z > 0.3 (mostly LRGs). The z bins have Az = 0.1 
and are centered on Zpin. 


where the integrand is evaluated at the fiducial value and ĝa are the cosmological parameters we want to estimate. 
For a z-shell of volume V(z) and for A, © 27/V'/3, we have: 


k2y2/3 


VV = (151) 
The k-cells were chosen in A21 and Q21 with equal A, = 22/V(z)'/° between kmin(z) and kmax, and kmin = 
2n/V(z)'/3 following (Garcia et al., 2020). A21 and Q21 assumed kmax = 0.1 h/Mpc, whereas H17 assumed 
kmax = 0.2 h/Mpc (see Table 6). As discussed in A21, the latter value is responsible for substantial increases in 
precision. 


3.12.2 Measurements and sample selection 


The equations above and the results below assume spectroscopic measurements of both galaxies and supernovae. If 
one has to rely on photometric data only, the corresponding photo-z errors will degrade the clustering measurements 
along the line-of-sight, resulting in larger effective non-linear smoothing factors Sy, g,s. For supernovae, the absence 
of spectroscopic follow-ups will result in contamination from core collapse supernovae, which could be a source of 
bias as discussed below. 


The need for galaxy spectra does not substantially decrease the final precision of the method as due to cosmic 
variance the information saturates for relatively low number densities, which should be reached with surveys like 
DESI (DESI Collaboration et al., 2016) and 4MOST (de Jong et al., 2019). For instance, in the Q21 Conservative 
case, only half a million galaxies with spectra would be required. This will only pose a real challenge in the cases 
where one tries to push to higher redshifts (z = 0.5), since the absolute number of galaxies needed to get close to 
the cosmic-variance limit in each redshift bin increases roughly with 22. 


3.12.3 Systematic effects 


The sources of systematic effects in the 6 x 2pt method are the same as for any supernovae and large scale structure 
survey. Here we limit ourselves to the list of the most important ones, referring to the literature for details. 


On the supernovae side, one has to expect various sources of systematic uncertainties. For instance, one can 
incorrectly classify core collapse supernovae or other transient phenomena as SNe Ia. This is specially problematic 
if SNe lack spectra, although there is an on-going effort to improve photometric classification techniques (see, e.g., 
Lochner et al., 2016; Ishida et al., 2019; Villar et al., 2020). Without further improvements in photometric classifica- 
tion, extra dispersion would need to be included in the SN distances to avoid biases, which Vargas dos Santos et al. 
(2019) showed that could lead to an effective reduction on the number of SN by up to two thirds. 
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Zbin 0.05 0.15 0.25 0.35 0.45 
H17 All Rubin SNe, o(Infog) 10 58 41 3.3 2.8 


Table 7: Relative lo errors in fog using the 3 x 2pt g-s method. Adapted from Howlett et al. (2017). 


Secondly, the standardization of SNe Ia might be more complicate of what usually assumed, with dependencies 
on environment, host mass, metallicities, etc., that are still not perfectly accounted and corrected for. Gravitational 
lensing of the sources is another possible form of bias, although the overall effect is expected to be negligible. The 
smoothing factors Sg sw that we introduced in the previous section might also deviate from the simple parameteri- 
zation we adopted, perhaps with a redshift dependence. If the SNe Ia redshifts are evaluated through photometric 
methods, there are of course additional sources of uncertainties, which could however be modeled by larger, and 
redshift dependent, smoothing factors. 


On the large-scale structure side, one should of course consider carefully other effects. First, finite surveys induce 
window-function distortions on the power spectrum shape that have to be taken into account, although on the 
forthcoming large surveys this problem is probably under control. Secondly, the redshift bins cannot really be taken 
as independent, and some correction is also expected (see, e.g., Bailoni et al., 2017). Moreover, magnitude lensing is 
also affecting the clustering (see, e.g., Cardona et al., 2016). Perhaps the most problematic systematics is however 
the impact of non-linearities. The assumption of linearity enters in fact our calculation in several ways: in the 
P(k) shape, in the Kaiser redshift correction, in the velocity-density contrast relation, and in the overall Gaussian 
assumption. The non-linearity is actually in principle accounted for by the smoothing factors Sy, gs, but of course 
these functions are calibrated only through ACDM simulations and might differ sensibly in alternative cosmologies. 


3.12.4 Main results and forecasts 


The results obtained in H17 for the 3 x 2pt case using galaxies and supernovae are summarized in Tab. 7 for the case 
dubbed All Rubin SNe, which is described in detail in Tab. 6. Here we just recast the H17 results in wider redshift 
bins. As can be seen, constraints in fog between 3 and 10% can be achieved in that case. 


Forecasts of the 6 x 2pt were performed in Q21 allowing a cosmological model with 5 parameters: {08, y, Qm, 
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Figure 47: 1 and 2ø marginalized forecasts in {og, y} in from the 6 x 2pt method for the Q21 Conservative 
(left) and Q21 Aggressive forecasts (right panel). Also shown are the CMB-only and joint constraints. As can be 
seen, the 6 x 2pt and CMB constraints are very complementary. Image adapted with permission from Quartin 
et al. (2021). 


119 


lo uncertainties in: ag y h Qin Qko (Inb) (nbs) 
Q21 Conservative 0.10 0.19 0.039 0.015 0.26 0.14 0.15 
Q21 Aggressive 0.036 0.067 0.013 0.0047 0.074 0.050 0.052 


Table 8: Fully marginalized absolute forecast uncertainties in each cosmological parameter using the 6 x 2pt 
method. The (relative) bias uncertainties are the average over all redshift bins, but their redshift dependence is 
small, only around ~10%. Adapted from Quartin et al. (2021). 
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Figure 48: Errors in H with the model-independent approach compared to using only galaxy clustering (red 
and green errorbars are slightly displaced for clarity). The two grey continuous lines represent H(z) for w = —0.9 
(top) and w = —1.1 (bottom), as a convenient graphical reference. 


Qko, h} and making use of 3 global nuisance parameters describing the non-linear smoothing factors in Eq. (146) and 
allowing each bias parameter to be free in each redshift bin. The final, marginalized, constraints in each parameter 
is given in Tab. 8, and the 2-D contours in {0g, y} are depicted in Fig. 47. As discussed in Q21, neglecting the AP 
corrections or assuming flatness has little impact on the og and y constraints (the other parameters are affected to 
a higher degree). This figure also illustrates the CMB contours, which were extracted from Mantz et al. (2015). We 
point the reader to Q21 for more details. 


Finally, using the methodology discussed in Amendola and Quartin (2021), one can employ the 6 x 2pt method 
also in another way, namely, to produce forecasts without assuming a parameterization of H(z), P(k, z) and Bg s(k, z). 
This is obtained by employing the data directly in every k, z-bin, avoiding therefore the need for assuming a specific 
cosmological model. Q21 showed that one can obtain uncertainties on F(z) around 3-4% in the farthest bin of the 
Aggressive survey, as shown in Fig. 48. 
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4 Synergies and complementarities between cosmological probes 


In Sect. 3, we extensively discussed all the characteristic and peculiarities of the new emerging cosmological probes, 
individually. At the end of this review, it is useful to explore also the gain achievable from the synergical comple- 
mentarity of the various probes when, potentially, combined together; this will allow us to assess if and how much 
they complete each other, and what we could learn from studying them jointly. 


The first important point to look at is the redshift range specifically mapped by each probe, shown in Fig. 49. 
The horizontal bands show the general range of the various methods as discussed in the corresponding sections, either 
currently or expected to be covered with future surveys. The dotted points show current measurements while the 
crosses indicate the forecasts. In some cases, a cosmological probe carries an integrated information from a larger 
redshift, as in the case of TDC and CCSL, being the measurements of sources at a much larger distance than the 
lenses, or of SA, providing information also of the entire expansion history up to the formation redshift of the star 
considered; we display that information with arrows in the plot. The various methods have been ordered from top 
to bottom as a function of the spanned range. In the bottom part of the plot are shown, for comparison, the main 
cosmological probes, namely CMB, BAO and SNe. The first thing that it is interesting to notice is how the new 
emerging cosmological probes richly complement the main probes covering different ranges of cosmic times, from the 
very local ones (z < 0.1 for SA, SBF, and SS), and extending to very high redshifts (up to z ~ 10— 12 for QSO, GRB, 
NHIM, and RD). They allow us to span almost 13.4 Gyr of cosmic time, a significantly larger range than the one 
reachable by current probes. Moreover, it is also relevant how a significant fraction of these methods overlap with the 
range of BAO and SNe (0.1 < z < 2 for CC, CSC, CV, CCSL, and TDC), providing a crucially wider compilation 
of late-Universe probes that can result decisive in breaking the dichotomy between late- and early-Universe results, 
and in validating the results obtained from standard probes. 
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Figure 49: Redshift distribution of the various emerging cosmological probes considered in this review. From 
top to bottom are shown stellar ages (SA), surface brightness fluctuations (SBF), standard sirens (SS), cosmic 
chronometers (CC), quasars (QSO), gamma-ray burst (GRB), clustering of standard candles (CSC), cosmic voids 
(CV), neutral hydrogen intensity mapping (NHIM), secular redshift drift (RD), cosmography with cluster strong 
lensing (CCSL), and time delay cosmography (TDC). The horizontal bands show, for each probe, the expected 
redshift range considering both current and future measurements, where the circle dots represent current measure- 
ments described in the review and cross signs the forecasts. The arrows indicate when a probe carries integrated 
information from a larger redshift, as in the case of stellar ages, mapping all the expansion history since their 
formation, or TDC and CCSL, carrying the information not only of the lenses (dotted points) but also of the 
sources. In the lower part of the figure is shown, for comparison, the redshift distribution of the main cosmological 
probes, namely baryon acoustic oscillations (BAO), supernovae (SNe), and cosmic microwave background (CMB). 
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Probe 


Measure 


Strength 


Weakness 


cosmic chronometers 


quasars 


gamma-ray bursts 


standard 
sirens/GWs 


time-delay 
cosmography 


cosmography with 
cluster strong lensing 


cosmic voids 


HI intensity mapping 


surface brightness 
fluctuations 


stellar ages 


redshift drift 


clustering of 
standard candles 


A(z) 


D(z) 


Dy (z) 


Dat, 
Da (Za, 2s) 


Da H(z), 
fos(z) 


Pur(k, z), 
Pate(k, z) 
Ho 


tu, Ho 


independent of cosmological model 
assumptions; differential approach 
mitigates several potential biases 


detectable over a wide redshift range; 
cosmology-independent estimate of 
D, 


detectable over a wide redshift range; 
poorly affected by dust or gas ab- 
sorption; no evidence of evolutionary 
effects; independent measure of Dy; 
correlations can be self-calibrated us- 
ing expected large datasets in the 
near future 


no calibration other than GR; inde- 
pendent measurement of Ho 


cosmology-independent direct mea- 
surement of Das, an absolute an- 
gular diameter distance product, to 
measure Ho 


one-step measurement, sensitive to 
both Ho and Universe geometry; less 
prone to inherent lensing systematics 


cosmology independent; pure 
geometry; linear dynamics; orthogo- 
nal to other probes 


galaxy evolution and cosmology, 
large volumes, high redshifts 


can be empirically (Cepheids or 
TRGB) and theoretically (SPS) cali- 
brated; small internal scatter (band- 
dependent); Hubble flow reached in 
one single observation (no temporal 
monitoring) 


no cosmological assumption; direct 
and complementary local probe 


cosmology-independent; direct test 
of dark energy 


makes use of data that will be 
available naturally from SNe and 
redshift surveys; model-independent 
measurement of E(z) 


need to be calibrated on SPS models; 
CC sample has to be accurately se- 
lected to minimize contamination 


large scatter of the Hubble diagram 
when compared to SNe Ia; small 
statistics at redshifts z > 4 


used correlations cannot be currently 
calibrated on nearby events (very 
low number / peculiar events); GRB 
prompt gamma-ray emission physics 
still to be fully understood 


need deep and complete catalogs for 
dark sirens, quick search and follow- 
up for bright sirens 


breaking lensing degeneracies re- 
quires high-precision ancillary data 
products (i.e. dynamic measure- 
ments) 


complexity of lens modeling, time- 
consuming spectroscopic and moni- 
toring observational campaigns 


need large contiguous survey vol- 
umes; accurate tracer redshifts for 
RSD & AP test 


foreground contamination, RFI, in- 
strumental systematics 


biased by dust or young stars, if any; 
practical limit with space-based reso- 
lution is around ~200 Mpc; accurate 
color measurements for calibration 


need further assessment to reduce 
systematics involved in stellar ages 
determination; need larger sample to 
increase the accuracy 


small signal; very long timeline; sta- 
bility of the measurement 
SNe systematics may bias distance 


measurements, which are used by the 
method 


Table 9: Summary table of the emerging cosmological probes, highlighting for each one what observable they 
are constraining, comparing their strengths and weaknesses. 
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Beyond a different redshift distribution, each probe has its own strengths and weaknesses: in Tab. 9 we summarize 
them, presenting also which quantity they are primarily constraining. From the table, it is evident their wide diversity. 
In the first place, we have that SBF and stellar ages, as also highlighted in Fig. 49, are mostly analyzing very local 
samples, and as a consequence will be in particular relevant in constraining local cosmological parameters. The 
advantage is that they require no cosmology-dependent calibrations, being either based on the direct estimate of the 
stellar age, or on calibration on other observables, with a very small scatter. They represent ideal methods to obtain 
complementary local estimate of the Hubble constant Ho and the age of the Universe ty. Similarly, standard sirens 
provide a direct and cosmology-independent estimate of the luminosity distance of sources detected through GWs 
that can, when a counterpart is identified directly (bright sirens) or statistically (dark sirens), lead to another direct 
and independent measurement of Ho; moreover, as described in Sect. 3.4 future observing runs and GW observatory 
will allow also a detection of the Hubble parameter H(z), with an accuracy comparable to, and competitive with, 
other methods. 


Moving at higher redshifts, to better comprehend the strengths of the various methods it is better to keep into 
account also the redshift dependence of the cosmological components. The dark energy component, in particular, 
dominates at smaller redshifts, z < 0.5, while at larger redshift the contribution of the matter, or of an evolving dark 
energy component, starts to become more significant. From this point of view, GRB and QSO represent optimal 
expansions to the Hubble diagram with respect to SNe, being able to measure the luminosity distance up to z ~ 8—12, 
providing ideal samples to test possible deviations from a standard ACDM model. 


On the other hand, the strength of CC, RD and CSC is to provide multiple cosmology-independent estimate of 
H(z) (or E(z)), that can constrain the expansion history of the Universe up to z ~ 2 with minimal assumptions, not 
needing to assume a specific background model (as done, e.g., for SNe or BAO) but just a FLRW metric. Fig. 50 
shows a comparison of the forecasts of future H(z) measurements that can be obtained with these cosmological 
probes; the various forecasts have been presented in Sect. 3.1,3.4, 3.11, and 3.12. It is interesting to notice that in 
the range 0 < z < 1 we will have in the future several method that will provide a percent or sub-percent accuracy 
measurement of the Hubble parameter in a cosmology-independent fashion. This will be crucial, because having 
multiple independent probes will allow us to check for consistency and keep systematic errors under control. At the 
same time, such an accuracy over a wide redshift range will provide an ideal dataset to really test a various range of 
cosmological models and constrain the components of our Universe. 


CC - combined 


SS - 5 year 
CSC - 6x2 aggressive 
RD i i | 


HOH OH OH O 


H(z) [km/s/Mpc] 


m 
N 
e] 
e 
e 
oO 
e 
© 
ou 
eo. 
N 


þa 
fo) 
oO 


io. 

ici 
i 
4 


0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 
redshift 


Figure 50: Forecasts of the Hubble parameter H(z) with future measurements from cosmic chronometers (CC), 
standard sirens (SS), clustering of standard candles (CSC), and redshift drift (RD), as discussed in the corre- 
sponding sections. The colored dotted points show the forecasts, with bands helping the visualization. 
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Figure 51: Current constraints on cosmological parameters from the various cosmological probes covered in this 
review, namely cosmic chronometers (CC), quasars (QSO), standard sirens (SS), time delay cosmography (TDC), 
surface brightness fluctuations (SBF), cosmic voids (CV), cosmography with cluster strong lensing (CCSL, SN 
Refsdal case Grillo et al., 2020), gamma-ray bursts (GRB, “Amati” relation), and stellar ages (SA). The figure 
shows the contour plot in the Ho-Qm plane for a flat ACDM cosmology, with their marginalized projection; the 
darker and lighter contours show the 68% and 95% confidence levels, respectively. In the case of QSO, as discussed 
in Sect. 3.2, also information from SNe Ia have been added to normalize the Hubble diagram; for SA, a Gaussian 
prior Qm=0.3 + 0.02 is assumed (Jimenez et al., 2019). The dashed lines indicate, for illustrative purposes, the 
values Ho=70 km s~'Mpc!and Qm=0.3. 


To conclude, in Fig. 51 are shown, on acommon plane, the current constraints achieved from the various cosmolog- 
ical probes discussed in this review. Since, as discussed above, different probes are sensitive to different parameters, 
we decided to explore a parameter space that maximizes the number of probes available, and in particular the con- 
straints that can be obtained in a flat ACDM cosmology where the parameters free to vary are the Hubble constant 
A and the Qm. With this plot it is possible to fully explore, on this plane, the complementarity and synergy between 
the various emerging cosmological probes. As a first point, we notice that, as also discussed previously, there are 
methods providing a constraint only on one of the two parameters. This is the case of CV and QSO, and of SBF, SS, 
and partially SA, that, as discussed in the previous sections, cannot measure Hy and Qm, respectively. It is useful 
to underline here that, in the case of QSO, we have included in the constraints also data from SNe, as discussed in 
Sect. 3.2, and that in the case of SA we have assumed a Gaussian prior on Qm=0.3 + 0.02 as in Jimenez et al. (2019). 
There are, then, also probes that are sensitive to both Hp and Qm, namely CC, TDC, CCSL, shown considering the 
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SN Refsdal case (see Sect. 3.6 Grillo et al., 2020), and GRB, here exploited with the “Amati relation” approach (see 
Sect. 3.3). 


Two points are worth underlining here. The first one is that all probes, despite different accuracies, are converging 
on a common part of the Ho-Nm plane. Given the extreme diversity between the methods considered, this is very 
relevant because it builds the possibility of combining different probes to improve the accuracy on the estimated 
parameters. This combination is, at the moment, beyond the scope of this review, because it will need to carefully 
address all possible systematics and covariances between the various probes, but Fig. 51 appears extremely promising 
for such a combination. The second one is that the various probes present also a significant degree of orthogonality, 
due to the different sensitivities discussed above. This has been proven to be extremely important in the past, where 
the extreme accuracy reached by the main probes was mainly based on the orthogonality between the constraints 
from SNe, BAO and CMB (see, e.g., Scolnic et al., 2018). Finding a similar level of complementarity also between 
the new emerging probes represents a good omen toward the use of these new methods in modern cosmology, to 
better constrain cosmological parameters, provide additional evidences to help solve current tensions, keep under 
control systematic effects of both the main and the new probes, and, potentially, discover new physics. 
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5 Summary and conclusions 


In this article, we have reviewed the new emerging cosmological probes that are contributing (and are expected to 
contribute in the near future) to modern cosmology. In particular, we have discussed cosmic chronometers, quasars, 
gamma-ray bursts, gravitational waves used as standard sirens, time-delay cosmography, cosmography with cluster 
strong lensing, cosmic voids, neutral hydrogen intensity mapping, surface brightness fluctuations, redshift drift, and 
clustering of standard candles. We presented, for each cosmological probe, the main equations involved in the 
method, how a sample can be selected and the method applied, reviewed the main results and expected forecasts, 
and discussed the systematics involved, showing also possible paths on how to mitigate or minimize those. 


In Sect. 4 we summarized the synergies and complementarities of those probes, presenting in Tab. 9 the main 
strengths and weaknesses of each probe, in Fig. 49 their redshift distribution compared to the one of BAO, SNe and 
CMB, and in Fig. 51 the current constraints obtained from each probe in the Ho-Q,, plane. 


These emerging cosmological probes represent a valuable resource for the next years, since they could allow us 
to go beyond the main cosmological probes currently exploited (SNe, BAO, CMB, weak lensing). In particular, 
they will provide crucial additional information to check for possible systematics in current analyses, increase the 
number of independent measurements of cosmological probes, and give new hints to address the current tensions in 
cosmology, possibly strengthening the need for new physics. As also shown in Fig. 50, these probes will represent 
also an important dataset in the future to obtain constraints on the expansion history of the Universe at the percent 
precision independently of assumptions on a particular cosmological model, being ideal complementary probes to 
the excellent results we are obtaining from the other main probes. The exploitation of new and complementary 
cosmological probes will be fundamental also in view of the new surveys and missions that are currently undergoing 
or planned, such as SDSS BOSS Data Release 16 (Ahumada et al., 2020), DESI (DESI Collaboration et al., 2016), 
Gaia (Gaia Collaboration et al., 2016), JWST (Gardner et al., 2006), Euclid (Laureijs et al., 2011), PFS (Takada 
et al., 2014), the Nancy Grace Roman Space Telescope (Spergel et al., 2015), the Vera Rubin Observatory (LSST 
Science Collaboration et al., 2009), the next LIGO-Virgo-Kagra observing runs (LIGO Scientific Collaboration et al., 
2015; Acernese et al., 2015; Akutsu et al., 2020) and future GW experiments like Cosmic Explorer (Reitze et al., 
2019) and the Einstein Telescope (Punturo et al., 2010), the MIGHTEE survey (Paul et al., 2021; Chen et al., 2021), 
ASKAP (Wolz et al., 2017a), SPHEREx (Doré et al., 2014), and the ATLAS mission (Wang et al., 2019). 
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